Changes between Version 2 and Version 3 of Impute2Pipeline


Ignore:
Timestamp:
Apr 19, 2013 5:20:45 PM (12 years ago)
Author:
freerkvandijk
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Impute2Pipeline

    v2 v3  
    1010A list of used software and versions can be found on the bottom of this page.
    1111
    12 All analysis jobs are generated using MOLGENIS compute, more information about MOLGENIS compute and the other analysis pipelines can be found here: https://github.com/molgenis/molgenis-pipelines
     12All analysis jobs are generated using [http://www.molgenis.org/wiki/ComputeStart MOLGENIS Compute], more information about MOLGENIS Compute and the other analysis pipelines can be found here: https://github.com/molgenis/molgenis-pipelines
    1313 [[br]]
    1414 [[br]]
     
    2626 * Create list of unmapped SNPs.
    2727 * Create plink mappings file.
    28  * Create new plink data without unmapped SNPs using Plink.
     28 * Create new plink data without unmapped SNPs using [http://pngu.mgh.harvard.edu/~purcell/plink/ Plink].
    2929 [[br]]
    3030 === Quality control of study data ===
    3131 * Location: https://github.com/molgenis/molgenis-pipelines/blob/master/compute4/Imputation_imputationtool_studyQC/protocols/studyQC.ftl
    32  [[br]]This protocol applies QC to the study data using imputationTool. This tool employs a binary format, called [http://genenetwork.nl/wordpress/trityper/ TriTyper] for rapid loading of big genotypic data. ImputationTool performs the following checks:
     32 [[br]]This protocol applies QC to the study data using [http://genenetwork.nl/wordpress/imputationtool/ imputationTool]. This tool employs a binary format, called [http://genenetwork.nl/wordpress/trityper/ TriTyper] for rapid loading of big genotypic data. ImputationTool performs the following checks:
    3333   1. Assesses strand alignment of alleles and swap SNPs if needed. For example, if a SNP which is in LD with multiple SNPs has a negative score the alleles are swapped and LD is calculated again. If the score of the SNP is still negative the SNP is removed from the study data.
    3434   1. Regular Quality Checks done during routine processing of GWAS data. These checks include: Hardy-Weinberq equilibrium should be higher that 10^-4, minor allele frequency should be higher than 0.01 and call rate should be higher than 0.95. If any of these criteria fails the SNP is removed from the study panel. These SNPs are removed because of the high likelihood that they contain erroneous genotypes.
     
    4141 === Phasing the study data ===
    4242 * Location: https://github.com/molgenis/molgenis-pipelines/blob/master/compute4/Imputation_shapeit_phasing/protocols/shapeitPhasing.ftl
    43  [[br]]This protocol phases the study data using shapeit, study data can be in the following formats: Ped/Map, Bed/Bim, Gen/Haps.
     43 [[br]]This protocol phases the study data using [http://shapeit.fr/ shapeit], study data can be in the following formats: Ped/Map, Bed/Bim, Gen/Haps.
    4444 [[br]]This protocol applies the following steps:
    4545 * Gather the map files as provided in the ALL_1000G_phase1integrated_v3_impute.tgz from the impute website.
     
    4848 === Imputing study data ===
    4949 * Location: https://github.com/molgenis/molgenis-pipelines/blob/master/compute4/Imputation_impute2/protocols/impute2Imputation.ftl
    50  [[br]]This protocol imputes the study data in chromosome bins of 5million bases using Impute. Afterwards the results per chromosome bin are merged into full chromosomes.
     50 [[br]]This protocol imputes the study data in chromosome bins of 5million bases using [http://mathgen.stats.ox.ac.uk/impute/impute_v2.html Impute2]. Afterwards the results per chromosome bin are merged into full chromosomes.
    5151 [[br]]This protocol applies the following steps:
    5252 * Imputation using Impute2 the phased study data, 1000G map files and the following two parameters over 5million base bins per chromosome: