Table of Contents
Work plan
SEEMS TO BE RATHER INCOMPLETE -- I HAVE INCLUDED OVERVIEW OF ONLY SOME PARTS OF THE PLAN, PLEASE REVISE
We have to run our project in two phases; phase 1 (from now till the end of 2010) running with minimal personnel (available at present) and phase 2 (starting ~Jan 2011) based on proper resource plan. It is assumed that pilot VCF data will be available at the end of September; we expect all data be available by the January 2011.
This document aims to provide an overview of the “VCF to haplotypes” line summarizing ChipBasedQcPipeline, MendelianQcPipeline, TrioAwarePhasingPipeline
Plan for phase 1.
Starting with the end September, when pilot data are available, it is required to build a pipeline for basic post-VCF quality control. This will include a number of independent sub-projects which may be ran in parallel in semi-independent manner:
Chip QC (ChipBasedQcPipeline): establish an infrastructure and cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips).
Mendelian QC (MendelianQcPipeline): establish an infrastructure and perform QC of genotypic data generated by BGI using Mendelian errors check.
New variants discovery (TrioAwareVariantDiscoveryPipeline?) phase 1: establish an infrastructure and provide preliminary list of variants discovered by GvNL
De-novo variants discovery (DeNovoVariationPipeline) phase 1: establish an infrastructure and provide preliminary list of 'de-novo' mutations
...
Major outcomes of phase 1:
- Established custom pipelines for Chip-based and Mendelian-check QC.
- The list of factors affecting quality of sequencing.
- Thresholds of quality metrics leading to the highest quality.
- False-positive and false-negative rates for variants discovered.
- Estimate of the potential of improvement of calls by exploiting information from the sequencing of relatives (see TrioAwareVariantDiscoveryPipeline?, TrioAwarePhasingPipeline).
- Estimate of the potential for de-novo variant discovery based on phasing information (see DeNovoVariationPipeline), provide preliminary list of such mutations
- QC'ed sequence data.
Plan for phase 2.
Genotype improvement and phasing (TrioAwarePhasingPipeline): establish an infrastructure and perform phasing of the sequence data
New variants discovery (TrioAwareVariantDiscoveryPipeline?) phase 2: provide the list of variants discovered by GvNL
De-novo variants discovery (DeNovoVariationPipeline) phase 2: provide list of 'de-novo' mutations