Changes between Version 60 and Version 61 of SnpCallingPipeline


Ignore:
Timestamp:
Jan 24, 2011 5:51:39 PM (14 years ago)
Author:
laurent
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SnpCallingPipeline

    v60 v61  
    110110The current important values discussed for the quality control along with their thresholds are the following:
    111111* RawData
    112 ** FastQC report (per mate of the pair)
    113 *** Manual look at files and check:
    114 **** Avg Quality per read > 30
    115 **** Num sequences ~60Mio
    116 **** Sequence quality should look OK
     112        * FastQC report (per mate of the pair)
     113                * Manual look at files and check:
     114                        * Avg Quality per read > 30
     115                        * Num sequences ~60Mio
     116                        * Sequence quality should look OK
    117117* Alignment (per lane)
    118 ** Picard Alignment Summary Metrics
    119 *** %Purified reads aligned > 90%
    120 *** Purified High Quality Error Rate < 1%
    121 *** Purified reads aligned > 150Mio
    122 ** Picard GC Bias Metrics
    123 *** GC Curve should look OK
    124 *** Median GC% windows between 30 and 40
    125 *** Avg Mean Base Quality should be OK
    126 ** Picard Insertsize Metrics
    127 *** Peak should be ~500
    128 *** Peak should be narrow
    129 *** Should have few outliers
    130 ** Picard BAM Index Stats
    131 *** Should be uniform by Chromosome
    132 ** GATK or Picard (currently testing) Coverage Metrics
    133 *** Should correspond to a Poisson curve with peak at 12x
    134 ** Picard Mark Duplicates
    135 *** %duplicates between 5% and 8%
     118        * Picard Alignment Summary Metrics
     119                * %Purified reads aligned > 90%
     120                * Purified High Quality Error Rate < 1%
     121                * Purified reads aligned > 150Mio
     122        * Picard GC Bias Metrics
     123                * GC Curve should look OK
     124                * Median GC% windows between 30 and 40
     125                * Avg Mean Base Quality should be OK
     126        * Picard Insertsize Metrics
     127                * Peak should be ~500
     128                * Peak should be narrow
     129                * Should have few outliers
     130        * Picard BAM Index Stats
     131                * Should be uniform by Chromosome
     132        * GATK or Picard (currently testing) Coverage Metrics
     133                * Should correspond to a Poisson curve with peak at 12x
     134        * Picard Mark Duplicates
     135                * %duplicates between 5% and 8%
    136136* Recalibration
    137 ** GATK Analyze Covariate
    138 *** No output currently; should revisit when working
    139 ** Picard Quality by Cycle
    140 *** To be determined once data is produced
    141 ** Picard Quality Distribution
    142 *** To be determined once data is produced
     137        * GATK Analyze Covariate
     138                * No output currently; should revisit when working
     139        * Picard Quality by Cycle
     140                * To be determined once data is produced
     141        * Picard Quality Distribution
     142                * To be determined once data is produced
    143143* Initial SNP Calling
    144 ** To be determined once data is produced and analyzed. A first basis for it should be derived from the difference between chipdata and sequence data and the %of SNPs found in dbSNP.
     144        * To be determined once data is produced and analyzed. A first basis for it should be derived from the difference between chipdata and sequence data and the %of SNPs found in dbSNP.
    145145