Protocol for comparison between sequencing (VCF) and chip data

In this project, we will establish an infrastructure and will cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips).


Status: under development

Contributors: Yurii, Lennart, Elisa

Timeline: end Sep 2010 - end Dec 2010

Resources: BI/data manager/programmer at 0.75 fte (the same as the one on MendelianQcPipeline) + experienced supervisor at 0.1 fte (the same as the one on MendelianQcPipeline)

Depends on: availability of VCF data

Other projects depending on this: MendelianQcPipeline (soft), all projects which start with QC'ed data (e.g. all WP2 projects)

Aims and Deliverables

  • Establish custom pipeline for Chip-based QC.
  • Verify the identity of the DNA samples
  • Check quality of sequence data.
  • Identify factors affecting quality of sequencing (e.g. batch effects).
  • Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity of genotype calling.
  • Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account).
  • Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor).
  • In accord with MendelianQcPipeline, provide QC'ed data


A principal idea of what questions should be addressed (without saying how) is summarized in ChipBasedQcPipelineIdea.

A number of 'burning' questions need to be addressed before the idea can be considered finished. These include


Automated workflow (will be) provided in ChipBasedQcPipelineWorkflow? page.

