Table of Contents
Protocol for comparison between sequencing (VCF) and chip data
In this project, we will establish an infrastructure and will cross-check the genotypic data generated by BGI with the data already available GWA scans (DNA chips).
Summary
Status: under development
Contributors: Yurii, Lennart, Elisa
Timeline: end Sep 2010 - end Dec 2010
Resources: BI/data manager/programmer at 0.75 fte (the same as the one on MendelianQcPipeline) + experienced supervisor at 0.1 fte (the same as the one on MendelianQcPipeline)
Depends on: availability of VCF data
Other projects depending on this: MendelianQcPipeline (soft), all projects which start with QC'ed data (e.g. all WP2 projects)
Aims and Deliverables
- Establish custom pipeline for Chip-based QC.
- Verify the identity of the DNA samples
- Check quality of sequence data.
- Identify factors affecting quality of sequencing (e.g. batch effects).
- Establish (preliminary) thresholds of quality metrics maximizing sensitivity and specificity of genotype calling.
- Using above thresholds, establish the false-positive and false-negative rates for variants discovered in our study (if we do not take trio structure into account).
- Check if these rates are in agreement with theoretically expected (thus we do not miss any important experimental factor).
- In accord with MendelianQcPipeline, provide QC'ed data
Idea
A principal idea of what questions should be addressed (without saying how) is summarized in ChipBasedQcPipelineIdea.
A number of 'burning' questions need to be addressed before the idea can be considered finished. These include
- What format will be used for/by Chip data (ChipGtDataFormat)?
- What factors, potentially affecting or indicating the quality of sequencing data (FactorsRelatedToSeqDataQuality), are to be addressed in the QC?
Workflow
Automated workflow (will be) provided in ChipBasedQcPipelineWorkflow? page.