Changes between Version 1 and Version 2 of WorkPlan
- Timestamp:
- Sep 26, 2010 10:15:04 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WorkPlan
v1 v2 1 1 [[TOC()]] 2 = Work plan for quality control, call improvement and phasing (VCF to haplotypes)=2 = Analysis work plan = 3 3 4 Compiled: Yurii Aulchenko, September 12, 2010 4 We have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011. 5 5 6 As noted by Morris, we have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011. 7 8 This document aims to provide an overview of the “VCF to haplotypes” work package. 6 This document aims to provide an overview of the “VCF to haplotypes” line summarizing ChipBasedQcPipeline, MendelianQcPipeline, TrioAwarePhasingPipeline 9 7 10 8 == Plan for phase 1. == 11 9 12 Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include two independent sub-projects which may be ran in parallel:10 Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include a number of independent sub-projects which may be ran in parallel in semi-independent manner: 13 11 14 '''Chip QC project: '''Crosscheck of the results obtained from BGI with already available GWA scans data.12 '''Chip QC (ChipBasedQcPipeline):''' establish an infrastructure and cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips). 15 13 16 This work package ''aims to'': 14 '''Mendelian QC (MendelianQcPipeline)''': establish an infrastructure and perform QC of genotypic data generated by BGI using Mendelian errors check. 17 15 18 * Establish custom pipeline for Chip-based QC. 19 * Check quality of sequence data. 20 * Identify factors affecting quality of sequencing (e.g. batch effects). 21 * Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity. 22 * Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account). 23 * Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor). 16 '''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 1''': establish an infrastructure and provide preliminary list of variants discovered by GvNL 24 17 25 '' Detailedworkflow'' is summarized in a separate document.18 '''De-novo variants discovery (DeNovoVariationPipeline) phase 1''': establish an infrastructure and provide preliminary list of 'de-novo' mutations 26 19 27 * Estimated costs'' for pilot data check and establishing the pipeline: 28 * 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte. 29 * Suggested timeline:'' end of September – end of December 30 * Depends on:'' availability of VCF pilot data 31 * Other projects depending on this: MendelianQcPipeline (soft), QC’ed data (hard) 20 ... 21 22 Major outcomes of phase 1: 23 * Established custom pipelines for Chip-based and Mendelian-check QC. 24 * The list of factors affecting quality of sequencing. 25 * Thresholds of quality metrics leading to the highest quality. 26 * False-positive and false-negative rates for variants discovered. 27 * Estimate of the potential of improvement of calls by exploiting information from the sequencing of relatives (see TrioAwareVariantDiscoveryPipeline, TrioAwarePhasingPipeline). 28 * Estimate of the potential for ''de-novo'' variant discovery based on phasing information (see DeNovoVariationPipeline), provide preliminary list of such mutations 29 * QC'ed sequence data. 30 31 == Plan for phase 2. == 32 33 '''Genotype improvement and phasing (TrioAwarePhasingPipeline)''': establish an infrastructure and perform phasing of the sequence data 34 35 '''New variants discovery (TrioAwareVariantDiscoveryPipeline) phase 2''': provide the list of variants discovered by GvNL 36 37 '''De-novo variants discovery (DeNovoVariationPipeline) phase 2''': provide list of 'de-novo' mutations