== Introduction == !ImputationTool is a collection of methods to perform pre- and post- analysis for imputation related tasks. === Implementation === !ImputationTool has been created by: * Dr. Lude Franke (Lude@ludesign.nl) : Format design, Initial methods. * Harm-Jan Westra (harm-jan@attix.nl): Extensions, Format converters, SNPs checks. It has been written in java, !NetBeans. == Documentation == From the ImputationTool help screen: {{{ ImputationTool v0.2 ------------------------ PreProcessing ------------------------ # Create random batches of cases and controls from a TriTyper dataset. Creates a file called batches.txt in outdir. --mode batch --in TriTyperdir --out outdir --size batchsize ------------------------ Imputation ------------------------ # Convert Impute Imputed data into TriTyper --mode itt --in ImputeDir --out TriTyperDir ------------------------ Beagle ------------------------ # Convert beagle files (one file/chromosome) to TriTyper. Filetemplate is a template for the batch filenames, The text CHROMOSOME will be replaced by the chromosome number. --mode btt --in BeagleDir --tpl template --ext ext --out TriTyperDir [--fam famfile] # Convert batches of beagle files (multiple files / chromosome) to trityper files. Filetemplate is a template for the batch filenames, The text CHROMOSOME will be replaced by the chromosome number, BATCH by the batchname. --mode bttb --in BeagleDirdir --tpl template --out TriTyperDir --size numbatches ------------------------ Ped+Map (Plink files) ------------------------ # Converts Ped and Map files created by ttpmh to Beagle format --mode pmbg --in indir --batch-file batches.txt # Converts TriTyper file to Plink Dosage format. Filetemplate is a template for the batch filenames, The text CHROMOSOME will be replaced by the chromosome number, BATCH by the batchname. --mode ttpd --in indir --beagle beagledir --tpl template --batchdesc batchdescriptor --out outdir --fam famfile # Converts PED and MAP files to TriTyper. --mode pmtt --in Ped+MapDir --out TriTyperDir # Converts TriTyper file to PED and MAP files. The FAM file is optional. --split splits the ped and map files per chromosome --mode ttpm --in indir --out outdir [--fam famfile] [--split] # Converts TriTyper dataset to Ped+Map concordant to reference (hap) dataset. Supply a batchfile if you want to export in batches. Supply a chromosome if you want to export a certain chromosome. --mode ttpmh --in TriTyperDir --hap TriTyperReferenceDir --out outdir [--fam famfile] [--batch-file batchfile] [--chr chromosome] [--exclude fileName] --------------------- PostProcessing --------------------- # Correlates genotypes of imputed vs non-imputed datasets. Saves a file called correlationOutput.txt in outdir, containing correlation per chromosome as well as correlation distribution. --mode corr --in TriTyperDir --name datasetname --in2 TriTyperDir2 --name2 datasetname2 --out outdir [--snps snplist] # Correlates genotypes of imputed vs non-imputed datasets. Also take Beagle imputation score (R2) into account. Saves a file called correlationOutput.txt in outdir, containing correlation per chromosome as well as correlation distribution. --mode corrb --in TriTyperDir --name datasetname --in2 TriTyperDir2 --name2 datasetname2 --out outdir --beagle beagleDir --tpl template --size numBatches # Gets all the excluded snps from chrx.excludedsnps.txt with a certain call-rate threshold (0 < threshold < 1.0) --mode ecra --in TriTyperDir --threshold threshold # Generates R2 distribution (beagle quality score) for each batch and chromosome, and tests each batch against chromosome R2 distribution, using WilcoxonMannWhitney test --mode r2dist --in BeagleDir --template template --out outdir --size numbatches # Merge two TriTyper datasets --mode merge --in TriTyper1Dir --in2 TryTyper2Dir --out outdir }}} === The !TriTyper Format === !TriTyper is a binary format to store genotype information, including insertion, deletion and expression data, providing very efficient read/write/seek methods. It has been written in java and managed by !NetBeans.