| | 1 | [[TOC()]] |
| | 2 | = Work plan for quality control, call improvement and phasing (VCF to haplotypes) = |
| | 3 | |
| | 4 | Compiled: Yurii Aulchenko, September 12, 2010 |
| | 5 | |
| | 6 | As noted by Morris, we have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011. |
| | 7 | |
| | 8 | This document aims to provide an overview of the “VCF to haplotypes” work package. |
| | 9 | |
| | 10 | == Plan for phase 1. == |
| | 11 | |
| | 12 | Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include two independent sub-projects which may be ran in parallel: |
| | 13 | |
| | 14 | '''Chip QC project: '''Crosscheck of the results obtained from BGI with already available GWA scans data. |
| | 15 | |
| | 16 | This work package ''aims to'': |
| | 17 | |
| | 18 | * Establish custom pipeline for Chip-based QC. |
| | 19 | * Check quality of sequence data. |
| | 20 | * Identify factors affecting quality of sequencing (e.g. batch effects). |
| | 21 | * Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity. |
| | 22 | * Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account). |
| | 23 | * Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor). |
| | 24 | |
| | 25 | ''Detailedworkflow'' is summarized in a separate document. |
| | 26 | |
| | 27 | * Estimated costs'' for pilot data check and establishing the pipeline: |
| | 28 | * 3 months of BI/data manager/programmer at 1.0 fte + experienced supervisor at 0.1 fte. |
| | 29 | * Suggested timeline:'' end of September – end of December |
| | 30 | * Depends on:'' availability of VCF pilot data |
| | 31 | * Other projects depending on this: MendelianQcPipeline (soft), QC’ed data (hard) |