Version 1 (modified by 8 years ago) (diff) | ,
---|
Pipeline todos
This page is reserved to track planned modifications to the pipeline for the full run.
Timeline
- October 1st will be the target date to start implementing these features into the final pipeline. Issues should be filed before this date.
The full run will start when:
- Issues on this page are implemented
- Metadatabase issues resolved
- All FQ files are merged and available
- Aim to start running after final plans for the second paper are clear
Full run implementation list
- Two alignments to accommodate downstream analyses (QTL, ASE) to their full potential
- Unmasked for QTL (and expression quantification)
- Masked for ASE (Check with Dasha for the masked index)
- Mask with GoNL, 1KG and UMCG ASE study snps.
- Separate map statistics in analysis database
- Modify STAR settings to Encode (below)
- Variant calling on unmasked bam/mpileup (ASE)
Discussion points
- STAR 2-pass?
Suggested STAR Encode settings
Encode settings (Settings sent to me by Alexander Dobin who did the alignment for some of Encode samples): /home/dzhernakova/tools/STAR_2.3.0e.Linux_x86_64/STAR \ --runThreadN 8 \ --genomeDir /home/dzhernakova/resources/STARindex_GoNL/ \ --genomeLoad NoSharedMemory \ --readFilesIn /home/dzhernakova/data/rawData/LL-557-130804_R1.fq.gz ~/data/rawData/LL-557-130804_R2.fq.gz \ --readFilesCommand zcat \ --outFileNamePrefix ~/data/mappedData/LL-557-130804.encode/LL-557-130804.encode. \ --outSAMstrandField intronMotif \ --outSAMunmapped Within \ --outFilterType BySJout \ //reduces the number of "spurious" junctions --outFilterMultimapNmax 20 \ //max multiple alignments per read: if exceeded, read is considered unmapped --outFilterMismatchNmax 999 \ //max number of mismatches per pair (absolute) --outFilterMismatchNoverLmax 0.04 \ //max mismatches per pair relative to length (0.04*(2*50)=4) --alignIntronMin 20 \ //min intron size (default: 21) --alignIntronMax 1000000 \ //max intron (default: specified by the size of bins) --alignMatesGapMax 1000000 \ //max genomic distance between mates (default: specified by the size of bins) --alignSJoverhangMin 8 \ //min overhang for unannotated junctions (default: 5) --alignSJDBoverhangMin 1 //min overhang for annotated junctions (default: 3)