Changes between Version 10 and Version 11 of GwasQcPipeline
- Timestamp:
- Feb 10, 2011 5:37:18 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GwasQcPipeline
v10 v11 4 4 5 5 {{{ 6 p-link --bfile GvNL_250111 --remove GvNL_bad_samples.txt --make-bed --out GvNL_good_samples}}} 6 p-link --bfile GvNL_250111 --remove GvNL_bad_samples.txt --make-bed --out GvNL_good_samples 7 }}} 7 8 8 9 Update family information 9 10 10 11 {{{ 11 p-link --bfile GvNL_samples --update-ids GvNL_family_update.txt --make-bed --out GvNL_good_fam}}} 12 p-link --bfile GvNL_samples --update-ids GvNL_family_update.txt --make-bed --out GvNL_good_fam 13 }}} 12 14 13 15 Set trio relationships 14 16 15 17 {{{ 16 p-link --bfile GvNL_good_fam --update-parents GvNL_parents_update.txt --make-bed --out GvNL_good_fam_par}}} 18 p-link --bfile GvNL_good_fam --update-parents GvNL_parents_update.txt --make-bed --out GvNL_good_fam_par 19 }}} 17 20 18 21 Manual check for suspicious individuals (not in trios) and remove them 19 22 20 23 {{{ 21 p-link --bfile GvNL_good_fam_par --remove GvNL_suspicious.txt --make-bed --out GvNL_good_fam_par_susp}}} 24 p-link --bfile GvNL_good_fam_par --remove GvNL_suspicious.txt --make-bed --out GvNL_good_fam_par_susp 25 }}} 22 26 23 27 NB: After checking suspicious individuals with Mathieu, turns out it was 1 typo and 1 filtered low call rate (93%). Therefore, only the low callrate individual was screened out after all. … … 26 30 27 31 {{{ 28 p-link --bfile GvNL_good_fam_par_susp --check-sex --make-bed}}} 32 p-link --bfile GvNL_good_fam_par_susp --check-sex --make-bed 33 }}} 29 34 30 35 Crosscheck with information from Genome Studio provided by Mathieu … … 36 41 37 42 {{{ 38 p-link --bfile GvNL_good_fam_par_susp --update-sex GvNL_sex_update.txt --make-bed --out GvNL_raw_final}}} 43 p-link --bfile GvNL_good_fam_par_susp --update-sex GvNL_sex_update.txt --make-bed --out GvNL_raw_final 44 }}} 39 45 40 46 Identification of individuals with elevated missing data rates or outlying heterozygosity rate (See Anderson, NP2010, p.1569) => How does this compare with the PLINK suggested approach [[http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml here]] … … 43 49 44 50 {{{ 45 p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final}}} 51 p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final 52 }}} 46 53 47 54 Get heterozygocity information: 48 55 49 56 {{{ 50 p-link --bfile GvNL_raw_final --het --out GvNL_raw_final}}} 57 p-link --bfile GvNL_raw_final --het --out GvNL_raw_final 58 }}} 51 59 52 60 Calculate the observed heterozygosity rate per individual using the formula (N(NM) − O(Hom))/N(NM) and plot the missing SNPs vs heterozygocity rate for eyeball inspection. … … 56 64 het <- read.table("GvNL_raw_final.het", head=TRUE) 57 65 mis <- read.table("GvNL_raw_final.imiss", head=TRUE); 58 het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM." plot (mis$F_MISS,het$HET_RATE, xlab="Missing SNPs", ylab="Heterozygocity Rate")}}} 66 het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM." plot (mis$F_MISS,het$HET_RATE, xlab="Missing SNPs", ylab="Heterozygocity Rate") 67 }}} 59 68 60 69 No samples with callrate < 95% (Filtered in lab) … … 67 76 het_fail = subset(het, (het$HET_RATE < mean(het$HET_RATE)-3*sd(het$HET_RATE)) | (het$HET_RATE > mean(het$HET_RATE)+3*sd(het$HET_RATE))); 68 77 het_fail$HET_DST = (het_fail$HET_RATE-mean(het$HET_RATE))/sd(het$HET_RATE); 69 write.table(het_fail, "fail-het-qc.txt", row.names=FALSE)}}} 78 write.table(het_fail, "fail-het-qc.txt", row.names=FALSE) 79 }}} 70 80 71 81 Result: … … 76 86 77 87 {{{ 78 p-link --bfile GvNL_raw_final --indep-pairwise 50 5 0.2 --out GvNL_raw_final}}} 88 p-link --bfile GvNL_raw_final --indep-pairwise 50 5 0.2 --out GvNL_raw_final 89 }}} 79 90 80 91 * Generate pairwise Identity-By-State (IBS) metrics using pruned SNPs only 81 92 82 93 {{{ 83 p-link --bfile GvNL_raw_final --extract GvNL_raw_final.prune.in --genome --out GvNL_raw_final}}} 94 p-link --bfile GvNL_raw_final --extract GvNL_raw_final.prune.in --genome --out GvNL_raw_final 95 }}} 84 96 85 97 * Check trio inheritance based on IBS: Check the Pi_HAT value for the individuals (unrelated individuals =~ 0, parent <-> child =~ 0.5, siblings =~0.5, twins =~ 1.0). Done with homemade perl script. … … 90 102 91 103 {{{ 92 p-link --file GvNL_raw_final --me 0.05 0.1 --make-bed --out GvNL_inheritance}}} 104 p-link --file GvNL_raw_final --me 0.05 0.1 --make-bed --out GvNL_inheritance 105 }}} 93 106 94 107 A diff between GvNL_raw_final.fam and GvNL_inheritance.fam confirms IBS version: families G34 and A56 have been removed.