Changes between Version 10 and Version 11 of GwasQcPipeline


Ignore:
Timestamp:
Feb 10, 2011 5:37:18 PM (14 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GwasQcPipeline

    v10 v11  
    44
    55{{{
    6 p-link --bfile GvNL_250111 --remove GvNL_bad_samples.txt --make-bed --out GvNL_good_samples}}}
     6p-link --bfile GvNL_250111 --remove GvNL_bad_samples.txt --make-bed --out GvNL_good_samples
     7}}}
    78
    89Update family information
    910
    1011{{{
    11 p-link --bfile GvNL_samples --update-ids GvNL_family_update.txt  --make-bed --out GvNL_good_fam}}}
     12p-link --bfile GvNL_samples --update-ids GvNL_family_update.txt  --make-bed --out GvNL_good_fam
     13}}}
    1214
    1315Set trio relationships
    1416
    1517{{{
    16 p-link --bfile GvNL_good_fam --update-parents GvNL_parents_update.txt  --make-bed --out GvNL_good_fam_par}}}
     18p-link --bfile GvNL_good_fam --update-parents GvNL_parents_update.txt  --make-bed --out GvNL_good_fam_par
     19}}}
    1720
    1821Manual check for suspicious individuals (not in trios) and remove them
    1922
    2023{{{
    21 p-link --bfile GvNL_good_fam_par --remove GvNL_suspicious.txt  --make-bed --out GvNL_good_fam_par_susp}}}
     24p-link --bfile GvNL_good_fam_par --remove GvNL_suspicious.txt  --make-bed --out GvNL_good_fam_par_susp
     25}}}
    2226
    2327NB: After checking suspicious individuals with Mathieu, turns out it was 1 typo and 1 filtered low call rate (93%). Therefore, only the low callrate individual was screened out after all.
     
    2630
    2731{{{
    28 p-link --bfile GvNL_good_fam_par_susp --check-sex --make-bed}}}
     32p-link --bfile GvNL_good_fam_par_susp --check-sex --make-bed
     33}}}
    2934
    3035Crosscheck with information from Genome Studio provided by Mathieu
     
    3641
    3742{{{
    38 p-link --bfile GvNL_good_fam_par_susp --update-sex GvNL_sex_update.txt --make-bed --out GvNL_raw_final}}}
     43p-link --bfile GvNL_good_fam_par_susp --update-sex GvNL_sex_update.txt --make-bed --out GvNL_raw_final
     44}}}
    3945
    4046Identification of individuals with elevated missing data rates or outlying heterozygosity rate (See Anderson, NP2010, p.1569) => How does this compare with the PLINK suggested approach  [[http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml here]]
     
    4349
    4450{{{
    45 p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final}}}
     51p-link --bfile GvNL_raw_final --missing --out GvNL_raw_final
     52}}}
    4653 
    4754Get heterozygocity information:
    4855
    4956{{{
    50 p-link --bfile GvNL_raw_final --het --out GvNL_raw_final}}}
     57p-link --bfile GvNL_raw_final --het --out GvNL_raw_final
     58}}}
    5159
    5260Calculate the observed heterozygosity rate per individual using the formula (N(NM)  −  O(Hom))/N(NM) and plot the missing SNPs vs heterozygocity rate for eyeball inspection.
     
    5664het <- read.table("GvNL_raw_final.het", head=TRUE)
    5765mis <- read.table("GvNL_raw_final.imiss", head=TRUE);
    58 het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM."&#10;plot (mis$F_MISS,het$HET_RATE, xlab="Missing SNPs", ylab="Heterozygocity Rate")}}}
     66het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM."&#10;plot (mis$F_MISS,het$HET_RATE, xlab="Missing SNPs", ylab="Heterozygocity Rate")
     67}}}
    5968
    6069No samples with callrate < 95% (Filtered in lab)
     
    6776het_fail = subset(het, (het$HET_RATE < mean(het$HET_RATE)-3*sd(het$HET_RATE)) | (het$HET_RATE > mean(het$HET_RATE)+3*sd(het$HET_RATE)));
    6877het_fail$HET_DST = (het_fail$HET_RATE-mean(het$HET_RATE))/sd(het$HET_RATE);
    69 write.table(het_fail, "fail-het-qc.txt", row.names=FALSE)}}}
     78write.table(het_fail, "fail-het-qc.txt", row.names=FALSE)
     79}}}
    7080
    7181Result:
     
    7686
    7787{{{
    78 p-link --bfile GvNL_raw_final --indep-pairwise 50 5 0.2 --out GvNL_raw_final}}}
     88p-link --bfile GvNL_raw_final --indep-pairwise 50 5 0.2 --out GvNL_raw_final
     89}}}
    7990
    8091* Generate pairwise Identity-By-State (IBS) metrics using pruned SNPs only
    8192
    8293{{{
    83 p-link --bfile GvNL_raw_final --extract GvNL_raw_final.prune.in --genome --out GvNL_raw_final}}}
     94p-link --bfile GvNL_raw_final --extract GvNL_raw_final.prune.in --genome --out GvNL_raw_final
     95}}}
    8496
    8597* Check trio inheritance based on IBS: Check the Pi_HAT value for the individuals (unrelated individuals =~ 0, parent <-> child =~ 0.5, siblings =~0.5, twins =~ 1.0). Done with homemade perl script.
     
    90102
    91103{{{
    92 p-link --file GvNL_raw_final --me 0.05 0.1 --make-bed --out GvNL_inheritance}}}
     104p-link --file GvNL_raw_final --me 0.05 0.1 --make-bed --out GvNL_inheritance
     105}}}
    93106
    94107A diff between GvNL_raw_final.fam and GvNL_inheritance.fam confirms IBS version: families G34 and A56 have been removed.