| 54 | |
| 55 | === Statistics_of_imputation_results === |
| 56 | * Location: http://www.bbmriwiki.nl/svn/Imputation/alex/scripts/Statistics_of_imputation_results.ftl |
| 57 | Computes several statistics of imputation results. This is suitable when we have "real" genotype data to benchmark our imputation pipeline. The computed statistics are: |
| 58 | * Allelic R2 : according to http://www.sciencedirect.com/science/article/pii/S0002929709000123#sec2.7.2 |
| 59 | * Real_Allelic_R2 : Computes the R2 (or coefficient of determination) between a real and an imputed genotype. |
| 60 | * Imputation_Allele_Frequency and Standardized_allele_frequency_error : (From: http://www.sciencedirect.com/science/article/pii/S0002929709000123) Allele-frequency error is the difference between the true allele frequency in the sample and the estimated allele frequency in the sample computed from the posterior genotype probabilities. If the three posterior genotype probabilities for an individual are denoted pAA, pAB, and pBB, then the estimated A allele frequency is found by summing (2pAA + pAB) over all individuals and dividing by twice the number of individuals. However, allele-frequency error is difficult to interpret unless the true allele frequency and sample size are known. abs(p - q) / sqrt( ( p * (1-p))/ (2*n)). p is the allele frequency in the sample of n individuals from a population in Hardy-Weinberg equilibrium. q is the estimated allele frequency obtained from the imputed posterior genotype probabilities. |
| 61 | [[BR]] |
| 62 | Options: |
| 63 | * input_beagle_dosage_filename : The output of the beagle imputation |
| 64 | * input_beagle_unimputed_filename : The beagle file with the "real", un-imputed genotypes |
| 65 | * output_filename : Output filename for the stats |