This page describes the Imputation pipelines developed by the GoNL - Impute team. Please contribute. For help with trac wiki formatting: http://trac.edgewall.org/wiki/WikiFormatting [[br]] All scripts presented here are located in our SVN repository: http://www.bbmriwiki.nl/svn/imputation [[br]] Minutes of our Team Calls: [[http://www.bbmriwiki.nl/wiki/Imputations/Minutes]] == Contributors and Teams == * UMC Groningen: Alexandros Kanterakis alexandros.kanterakis@gmail.com == Study data == == Reference data == == Pre processing == === Normalize beagle datasets === * Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Normalize_beagle_datasets.ftl [[BR]] Takes a list of beagle and marker files and applies the following checks: * Checks if the SNPs are compatible. If the compatibility cannot be corrected by SNP inversion then it is discarded. * Checks if SNP has null alleles, if so, SNP is removed from study data. * Checks if two SNPs with same reference code (rs) are in the same position. * Checks if two SNPs in the same position have the same reference code (rs). * Checks if a SNP in the study has MAF < MAF_minimum, HWE < HWE_minimum and CR < CR_minimum if any of these criteria are met, the SNP is discarded. (MAF = Minor Allele Frequency, HWE = Hardy Weinberg Equilibrium, CR = Call Rate) It generates a log file with all inconsistencies found: At the end of this file there is a summary of the problems found: * '''SNPs inverted''': For Example A/G SNPs in reference , T/C SNPs in study * '''Allele problems''': Number of SNPs with inconsistent alleles in study and in reference that could not be fixed with flipping * '''Position problems (different references, same loci)''': As it says. These SNPs are NOT removed. We keep the reference (rs number) of the reference panel * '''Unresolved single alleles problems''': SNPs in study that have only one allele. These SNPs are filtered out. * '''Double rs codes problems''': As it says. This SNPs are filtered out. * '''SNPs in study with MAF < MAF_minimum''': SNPs with MAF < MAF_minimum set. * '''SNPs in study with HWE < HWE_minimum''': SNPs with HWE < HWE_minimum set. * '''SNPs in study with CR < CR_minimum''': SNPs with Call Rate < CR_minimum set * '''SNPs that differ in Allele Frequencies''': SNPs with difference in AF between reference and study over CR_minimum set. [[BR]] Options: * input_beagle_study : The study in beagle format * input_beagle_reference : The reference in beagle format * input_markers_study : The study's markers in beagle format * input_markers_reference : The reference's markers in beagle format * output_beagle_study : The Normalized output of the study (Use this as "study" for imputation) * output_beagle_reference : The Normalized output of the reference (Normally you will not use this file) * output_markers_study : The Markers of the normalized study * output_markers_reference : The Markers of the normalized reference * output_log_filename : the log filename == Imputation software == * Impute2 * Beagle * Mach / Minimach == Quality metrics == === Convert impute2 gprobs to TPED === * Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Convert_impute2_gprobs_to_PEDMAP_beagle.ftl [[BR]] This method is suitable to convert results from impute2 imputation to TPED. You can define an R2 threshold. The R2 is the allelic R2 according to http://www.sciencedirect.com/science/article/pii/S0002929709000123#sec2.7.2 . You can copy the TFAM from the original study in order to have a complete TPED / TFAM dataset. [[BR]] Options: * input_impute2_gprobs_filename : The gprobs file generated from impute2 * output_TPED_filename : The output TPED filename * output_stats_filename : The file where the R2 estimation will be printed. It will contain ALL the R2 values not only these surpassing the threshold * chromosome : The chromosome of this study * r2_threshold : The R2 threshold == Complete pipelines == == Results == == References == == See also == * An older version of the imputation pipeline developed mainly by Harm-Jan and Lude Franke: [[ImputationPipeline_old]] it uses the [[ImputationTool]] for study / reference normalization. * SVN repository: http://www.bbmriwiki.nl/svn/imputation * http://gettinggeneticsdone.blogspot.com/2010/04/probabel-r-package-for-gwas-data.html ProbABEL - R package for GWAS data imputation. http://www.biomedcentral.com/1471-2105/11/134 http://mga.bionet.nsc.ru/~yurii/ABEL/GenABEL/ an R library for Genome-wide association analysis. * MACH: http://www.sph.umich.edu/csg/abecasis/MACH/ * Impute2: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html