| 1 | |
| 2 | = Pipeline todos = |
| 3 | |
| 4 | This page is reserved to track planned modifications to the pipeline for the full run. |
| 5 | |
| 6 | == Timeline == |
| 7 | |
| 8 | - '''October 1st will be the target date to start implementing these features into the final pipeline. Issues should be filed before this date.''' |
| 9 | |
| 10 | The full run will start when: |
| 11 | |
| 12 | - Issues on this page are implemented |
| 13 | - Metadatabase issues resolved |
| 14 | - All FQ files are merged and available |
| 15 | |
| 16 | - '''Aim to start running after final plans for the second paper are clear''' |
| 17 | |
| 18 | == Full run implementation list == |
| 19 | |
| 20 | - Two alignments to accommodate downstream analyses (QTL, ASE) to their full potential |
| 21 | 1. Unmasked for QTL (and expression quantification) |
| 22 | 2. Masked for ASE (Check with Dasha for the masked index) |
| 23 | - Mask with GoNL, 1KG and UMCG ASE study snps. |
| 24 | - Separate map statistics in analysis database |
| 25 | - Modify STAR settings to Encode (below) |
| 26 | - Variant calling on unmasked bam/mpileup (ASE) |
| 27 | |
| 28 | == Discussion points == |
| 29 | |
| 30 | - STAR 2-pass? |
| 31 | |
| 32 | === Suggested STAR Encode settings === |
| 33 | |
| 34 | {{{ |
| 35 | Encode settings (Settings sent to me by Alexander Dobin who did the alignment for some of Encode samples): |
| 36 | /home/dzhernakova/tools/STAR_2.3.0e.Linux_x86_64/STAR \ |
| 37 | --runThreadN 8 \ |
| 38 | --genomeDir /home/dzhernakova/resources/STARindex_GoNL/ \ |
| 39 | --genomeLoad NoSharedMemory \ |
| 40 | --readFilesIn /home/dzhernakova/data/rawData/LL-557-130804_R1.fq.gz ~/data/rawData/LL-557-130804_R2.fq.gz \ |
| 41 | --readFilesCommand zcat \ |
| 42 | --outFileNamePrefix ~/data/mappedData/LL-557-130804.encode/LL-557-130804.encode. \ |
| 43 | --outSAMstrandField intronMotif \ |
| 44 | --outSAMunmapped Within \ |
| 45 | --outFilterType BySJout \ //reduces the number of "spurious" junctions |
| 46 | --outFilterMultimapNmax 20 \ //max multiple alignments per read: if exceeded, read is considered unmapped |
| 47 | --outFilterMismatchNmax 999 \ //max number of mismatches per pair (absolute) |
| 48 | --outFilterMismatchNoverLmax 0.04 \ //max mismatches per pair relative to length (0.04*(2*50)=4) |
| 49 | --alignIntronMin 20 \ //min intron size (default: 21) |
| 50 | --alignIntronMax 1000000 \ //max intron (default: specified by the size of bins) |
| 51 | --alignMatesGapMax 1000000 \ //max genomic distance between mates (default: specified by the size of bins) |
| 52 | --alignSJoverhangMin 8 \ //min overhang for unannotated junctions (default: 5) |
| 53 | --alignSJDBoverhangMin 1 //min overhang for annotated junctions (default: 3) |
| 54 | }}} |