19 | | 1. if the dataset is in binary plink format, use plink --recode to convert back to ped+map) |
20 | | 2. convert dataset to trityper format, if it is in ped+map format. |
21 | | {{{ |
22 | | java -Xmx4g -jar ImputationTool.jar --mode pmtt --in $plinkLocation --out $trityperOutputLocation |
| 19 | === Step 1 === |
| 20 | * According to Harm-Jan: if the dataset is in binary plink format, use plink —recode to convert back to ped+map |
| 21 | * Transform .bed , .bim and .fam files into ASCII format |
| 22 | {{{ |
| 23 | /Users/alexandroskanterakis/Tools/plink/plink-1.07-mac-intel/plink --bfile /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05 --ped /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.ped --map CD_Finnuncorr.maf05.map --recode |
| 24 | |
| 31 | * real Execution time: 26m20.723s |
| 32 | * creates rather big files 4.6G plink.ped |
| 33 | |
| 34 | |
| 35 | === Step 2 === |
| 36 | |
| 37 | * Convert dataset to trityper format, if it is in ped+map format. |
| 38 | {{{ |
| 39 | java -Xmx4g -jar /Users/alexandroskanterakis/Tools/imputation/ImputationTool/dist/ImputationTool.jar --mode pmtt --in /Users/alexandroskanterakis/Data/Finnish_cohort/ --out /Users/alexandroskanterakis/Data/Finnish_cohort/ |
| 40 | |
| 41 | real 74m52.547s |
| 42 | user 38m14.646s |
| 43 | sys 5m25.472s |
| 44 | |
| 45 | }}} |
| 46 | * Files Created: |
| 47 | {{{ |
| 48 | -rw-r--r-- 1 alexandroskanterakis staff 2449071024 Oct 5 15:05 GenotypeMatrix.dat |
| 49 | -rw-r--r-- 1 alexandroskanterakis staff 46196 Oct 5 13:54 Individuals.txt |
| 50 | -rw-r--r-- 1 alexandroskanterakis staff 98791 Oct 5 13:54 PhenotypeInformation.txt |
| 51 | -rw-r--r-- 1 alexandroskanterakis staff 10771996 Oct 5 13:54 SNPMappings.txt |
| 52 | -rw-r--r-- 1 alexandroskanterakis staff 5006368 Oct 5 13:54 SNPs.txt |
| 53 | }}} |
| 54 | |
| 55 | === Step 3 === |
| 56 | * compare the dataset to be imputed to the reference dataset (for example HapMap?2 release 24, also in TriTyper? format), and remove any snps for which the haplotypes are different, or do not correlate to the reference dataset. Also remove any SNP that is not in the reference. Save the output as Ped+Map |
| 57 | {{{ |
| 58 | java -Xmx4g -jar /Users/alexandroskanterakis/Tools/imputation/ImputationTool/dist/ImputationTool.jar --mode ttpmh --in /Users/alexandroskanterakis/Data/Finnish_cohort/ --hap /Users/alexandroskanterakis/Data/HapMap2-r24-CEU/ --out /Users/alexandroskanterakis/Data/Finnish_cohort/referenceOutput/ |
| 59 | |
| 60 | real 60m53.623s |
| 61 | user 30m35.172s |
| 62 | sys 2m34.325s |
| 63 | }}} |
| 64 | |
| 65 | * Created Files (for each chromosome): |
| 66 | {{{ |
| 67 | -rw-r--r-- 1 alexandroskanterakis staff 221184 Oct 5 16:07 chr1.dat |
| 68 | -rw-r--r-- 1 alexandroskanterakis staff 1004358 Oct 5 16:06 chr1.excludedsnps.txt |
| 69 | -rw-r--r-- 1 alexandroskanterakis staff 442368 Oct 5 16:07 chr1.map |
| 70 | -rw-r--r-- 1 alexandroskanterakis staff 441350 Oct 5 16:07 chr1.markersBeagleFormat |
| 71 | -rw-r--r-- 1 alexandroskanterakis staff 5802372 Oct 5 16:07 chr1.ped |
| 72 | -rw-r--r-- 1 alexandroskanterakis staff 117708 Oct 5 16:06 chr1.warningsnps.txt |
| 73 | }}} |
| 74 | |
| 75 | === Steps 4-9 === |
| 76 | |