Context Navigation

Changes between Version 1 and Version 2 of WorkPlan

Timestamp:: Sep 26, 2010 10:15:04 PM (15 years ago)
Author:: Yurii Aulchenko
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

WorkPlan

-                      v1
+                      v2
 [[TOC()]]
 = Work plan for quality control, call improvement and phasing (VCF to haplotypes) =
+= Analysis work plan =
+Compiled: Yurii Aulchenko, September 12, 2010
+We have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011.
+As noted by Morris, we have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011.
+This document aims to provide an overview of the “VCF to haplotypes” work package.
+This document aims to provide an overview of the “VCF to haplotypes” line summarizing ChipBasedQcPipeline, MendelianQcPipeline, TrioAwarePhasingPipeline
 == Plan for phase 1. ==
 Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include two independent sub-projects which may be ran in parallel:
+Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include a number of independent sub-projects which may be ran in parallel in semi-independent manner:
 '''Chip QC project: '''Crosscheck of the results obtained from BGI with already available GWA scans data.
+'''Chip QC (ChipBasedQcPipeline):''' establish an infrastructure and cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips).
+This work package ''aims to'':
+'''Mendelian QC (MendelianQcPipeline)''': establish an infrastructure and perform QC of genotypic data generated by BGI using Mendelian errors check.
+ * Establish custom pipeline for Chip-based QC.
+ * Check quality of sequence data.
+ * Identify factors affecting quality of sequencing (e.g. batch effects).
+ * Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity.
+ * Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account).
+ * Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor).
+'''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 1''': establish an infrastructure and provide preliminary list of variants discovered by GvNL
 ''Detailedworkflow'' is summarized in a separate document.
+'''De-novo variants discovery (DeNovoVariationPipeline) phase 1''': establish an infrastructure and provide preliminary list of 'de-novo' mutations
+ * Estimated costs'' for pilot data check and establishing the pipeline:
+ * 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte.
+ * Suggested timeline:'' end of September – end of December
+ * Depends on:'' availability of VCF pilot data
+ * Other projects depending on this: MendelianQcPipeline (soft), QC’ed data (hard)
+...
+Major outcomes of phase 1:
+ * Established custom pipelines for Chip-based and Mendelian-check QC.
+ * The list of factors affecting quality of sequencing.
+ * Thresholds of quality metrics leading to the highest quality.
+ * False-positive and false-negative rates for variants discovered.
+ * Estimate of the potential of improvement of calls by exploiting information from the sequencing of relatives (see TrioAwareVariantDiscoveryPipeline, TrioAwarePhasingPipeline).
+ * Estimate of the potential for ''de-novo'' variant discovery based on phasing information (see DeNovoVariationPipeline), provide preliminary list of such mutations
+ * QC'ed sequence data.
+== Plan for phase 2. ==
+'''Genotype improvement and phasing (TrioAwarePhasingPipeline)''': establish an infrastructure and perform phasing of the sequence data
+'''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 2''': provide the list of variants discovered by GvNL
+'''De-novo variants discovery (DeNovoVariationPipeline) phase 2''': provide list of 'de-novo' mutations