Protocol for comparison between sequencing (VCF) and chip data

In this project, we will establish an infrastructure and will cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips).


Status: under development

Contributors: Yurii, Lennart, Elisa

Timeline: end Sep 2010 - end Dec 2010

Resources: BI/data manager/programmer at 0.75 fte (the same as the one on MendelianQcPipeline) + experienced supervisor at 0.1 fte (the same as the one on MendelianQcPipeline)

Depends on: availability of VCF data

Other projects depending on this: MendelianQcPipeline (soft), all projects which start with QC'ed data (e.g. all WP2 projects)

Aims and Deliverables

  • Establish custom pipeline for Chip-based QC.
  • Verify the identity of the DNA samples
  • Check quality of sequence data.
  • Identify factors affecting quality of sequencing (e.g. batch effects).
  • Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity of genotype calling.
  • Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account).
  • Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor).
  • In accord with MendelianQcPipeline, provide QC'ed data


A principal idea of what questions should be addressed (without saying how) is summarized in ChipBasedQcPipelineIdea.

A number of 'burning' questions need to be addressed before the idea can be considered finished. These include


Automated workflow (will be) provided in ChipBasedQcPipelineWorkflow? page.

Last modified 10 years ago Last modified on Jun 22, 2011 2:05:32 PM