wiki:TrioAwarePhasingPipeline

Version 2 (modified by Yurii Aulchenko, 14 years ago) (diff)

more details added

TrioAwarePhasingPipeline

Call improvement and phasing

This is phase 2 project.

Background. We plan to use phased genotypes from GvNL for further imputations. In this, we need high quality of both genotypes and the phasing.

Problems. It is well recognized that at 12x there is an essential chance that a heterozygous genotype will not be called (estimated roughly as ~1%). Furthermore, for a given individual a certain proportion of the genome will not be covered well; the genotypes at these regions can not be called or will be called with low quality. The effects of such errors and missing data onto further imputations may be large. Other factor affecting quality of further imputations is quality of phasing.

Proposed solution. All above problems can be address in the same framework. Basically, phasing information provides us with the means to fill in missing genotypes and correct erroneously called ones. For example, if in a person coverage is low at a certain regions, we can use information from the first degree relative to figure out what genotypes are there. Sequencing errors can be detected in very much the same way. Thus, phasing and imputations provide us with an attractive opportunity to minimize sequencing errors and proportion of missing data.

This work package aims to:

· Improve quality of sequence genotypes data by fixing errors and filling in missing values

· Phase the genotypes

Detailed workflow is summarized in a separate document.

Estimated costs: 12 months of experienced PostDoc at 1.0 fte + BI/programmer at 0.5 fte + supervisor at 0.1 fte.

Suggested timeline: January 2011 – April 2011 (phased genotypes using already existing solutions) - July 2011 (phased genotypes using own solution) - December 2011 (release of software)

Depends on: availability of QC’ed genotypes from phase 1

Other projects depending on this: imputations / ImputationPipeline (hard), population genetics (LD, hard), functional variants / SnpAnnotationPipeline (final catalogue, soft), novel variants discovery (final catalogue, soft).

Major deliverables

  • Novel methods and software
  • Improved genotypes
  • Phasing information