wiki:ImputationPipeline

Version 38 (modified by a.kanterakis, 13 years ago) (diff)

--

This page describes the Imputation pipelines developed by the GoNL - Impute team. Please contribute. For help with trac wiki formatting: http://trac.edgewall.org/wiki/WikiFormatting
All scripts presented here are located in our SVN repository: http://www.bbmriwiki.nl/svn/imputation
Minutes of our Team Calls: http://www.bbmriwiki.nl/wiki/Imputations/Minutes

Contributors and Teams

  • UMC Groningen: Alexandros Kanterakis alexandros.kanterakis@…

Study data

Reference data

Pre processing

Normalize beagle datasets

  • Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Normalize_beagle_datasets.ftl
    Takes a list of beagle and marker files and applies the following checks:
  • Checks if the SNPs are compatible. If the compatibility cannot be corrected by SNP inversion then it is discarded.
  • Checks if SNP has null alleles, if so, SNP is removed from study data.
  • Checks if two SNPs with same reference code (rs) are in the same position.
  • Checks if two SNPs in the same position have the same reference code (rs).
  • Checks if a SNP in the study has MAF < MAF_minimum, HWE < HWE_minimum and CR < CR_minimum if any of these criteria are met, the SNP is discarded. (MAF = Minor Allele Frequency, HWE = Hardy Weinberg Equilibrium, CR = Call Rate)

It generates a log file with all inconsistencies found: At the end of this file there is a summary of the problems found:

  • SNPs inverted: For Example A/G SNPs in reference , T/C SNPs in study
  • Allele problems: Number of SNPs with inconsistent alleles in study and in reference that could not be fixed with flipping
  • Position problems (different references, same loci): As it says. These SNPs are NOT removed. We keep the reference (rs number) of the reference panel
  • Unresolved single alleles problems: SNPs in study that have only one allele. These SNPs are filtered out.
  • Double rs codes problems: As it says. This SNPs are filtered out.
  • SNPs in study with MAF < MAF_minimum: SNPs with MAF < MAF_minimum set.
  • SNPs in study with HWE < HWE_minimum: SNPs with HWE < HWE_minimum set.
  • SNPs in study with CR < CR_minimum: SNPs with Call Rate < CR_minimum set
  • SNPs that differ in Allele Frequencies: SNPs with difference in AF between reference and study over CR_minimum set.


Options:

  • input_beagle_study : The study in beagle format
  • input_beagle_reference : The reference in beagle format
  • input_markers_study : The study's markers in beagle format
  • input_markers_reference : The reference's markers in beagle format
  • output_beagle_study : The Normalized output of the study (Use this as "study" for imputation)
  • output_beagle_reference : The Normalized output of the reference (Normally you will not use this file)
  • output_markers_study : The Markers of the normalized study
  • output_markers_reference : The Markers of the normalized reference
  • output_log_filename : the log filename

Imputation software

  • Impute2
  • Beagle
  • Mach / Minimach

Quality metrics

Complete pipelines

Results

References

See also