Changes between Initial Version and Version 1 of BIOS_SampleBlacklist

Sep 19, 2016 5:15:17 PM (8 years ago)



  • BIOS_SampleBlacklist

    v1 v1  
     1= Data QC based on mix-up mapping and concordance of imputed genotypes with genotypes called from RNAseq data =
     2We used 3 ways of doing the QC:
     4 1. mix-up mapper: matching genotypes with expression for each sample;
     5 1. genotype concordance: calculating the concordance of imputed genotypes with genotypes called from RNAseq data;
     6 1. heterozygosity rate.
     8The blacklist of samples that do not pass these quality checks can be found in the attachment.
     10=== LLS ===
     11MixupMapper detected 5 swaps and 7 samples with wrong genotype. The swaps will be performed and the 7 genotypes replaced. This leaves 7 samples (pers_id: 2014, 3142, 3144, 2634, 2890, 3126 and 3150) without genotype, these should be removed.
     13||=geno_id =||=run_id =||=Best Match (geno_id) =||=Best Match (run_id) =||= Action =||
     14|| 561 ||BD2CPRACXX-1-12 || 563 ||BD2CPRACXX-1-21 || swap ||
     15|| 563 ||BD2CPRACXX-1-21 || 561 ||BD2CPRACXX-1-12 || swap ||
     16|| 974 ||BD24PGACXX-8-25 || 978 ||BD24PGACXX-7-8 || swap ||
     17|| 978 ||BD24PGACXX-7-8 || 974 ||BD24PGACXX-8-25 || swap ||
     18||1841 ||AD2CJPACXX-6-9 ||1842 ||AD2CJPACXX-5-1 || swap ||
     19||1842 ||AD2CJPACXX-5-1 ||1841 ||AD2CJPACXX-6-9 || swap ||
     20||2585 ||AD2DATACXX-3-21 ||3273 ||AD2DATACXX-3-22 || swap ||
     21||3273 ||AD2DATACXX-3-22 ||2585 ||AD2DATACXX-3-21 || swap ||
     22||3411 ||BD2D5MACXX-3-7 ||3413 ||BD2D5MACXX-4-15 || swap ||
     23||3413 ||BD2D5MACXX-4-15 ||3411 ||BD2D5MACXX-3-7 || swap ||
     24||2928 ||AD2DATACXX-8-1 ||2014 ||BD2CPRACXX-1-22 || replace genotype ||
     25||3126 ||AD1NFNACXX-8-25 ||3142 ||BD1NYRACXX-2-15 || replace genotype ||
     26||3142 ||BD1NYRACXX-2-15 ||3144 ||AD1NAMACXX-7-19 || replace genotype ||
     27||3194 ||AD2DATACXX-4-5 ||2634 ||AD2DATACXX-4-9 || replace genotype ||
     28|| 311 ||BD1NW4ACXX-7-13 ||2890 ||BD1NYRACXX-5-23 || replace genotype ||
     29|| 905 ||AD1NFNACXX-8-27 ||3126 ||AD1NFNACXX-8-25 || replace genotype ||
     30||6039 ||AD1NE2ACXX-5-22 ||3150 ||BD24PGACXX-5-5 || replace genotype ||
     33==== Possibly contaminated samples ====
     35The outliers that show high heterozygosity rate in genotypes called from RNA-seq. [[BR]][[BR]]
     36Also present in gender-specific analysis (see below):[[BR]]
     45Possible gender-neutral contaminations:[[BR]]
     54===  ===
     55=== LifeLines ===
     60Corresponding RNA-seq sample is AC1C40ACXX-4-4 (old id: 103001429206) has only 76% of reads aligned. Flagged by MixupMapper? as sample mix-up. Also shows many discordant genotypes when using SNVMix.
     65Corresponding RNA-seq sample is AD1GWFACXX-4-15 (old id: 103001383279), not flagged by MixupMapper?. However, shows many discordant genotypes when using SNVMix.
     67Has both high XIST and high chromosome Y expression levels. Average heteryzygosity for all samples = 49%, stdev = 1.9%. Sample LLDeep_0350, 103001383279 has heterozygosity rate of 72%: contaminated sample, where a male and female sample have likely been mixed in very similar proportions, hence the high expression levels of both XIST and chromosome Y genes.[[BR]]
     69Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]]
     72=== CODAM ===
     73'''eQTL mapping (gene level) results:'''
     756804 unique cis-regulated genes.[[BR]][[BR]]
     77'''Samples that failed the QC:'''
     792345 (RNA-seq ids: AD10W1ACXX-8-11, CODAM-102-130804): mix-up mapper + genotype concordance;
     812495 (RNA-seq ids: AD10W1ACXX-5-18, CODAM-156-130804): mix-up mapper + genotype concordance;
     83It looks like RNA-seq sample ids were swapped for these two samples (see: of 12-December-2013)[[BR]]
     85Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]]
     87=== RS ===
     88'''eQTL mapping (gene level) results:'''
     907708 unique cis-regulated genes.[[BR]][[BR]]
     92'''Samples that failed the QC:'''
     948190002 (RNA-seq ids: AD1NNNACXX-4-18, RS-287-130804): mix-up mapper + genotype concordance;
     969353 (AC1JV9ACXX-1-13, RS-761-130804): mix-up mapper + genotype concordance;
     983520 (BC1JTJACXX-6-7, RS-442-130804): genotype concordance;
     100562 (BC1KAVACXX-8-13, RS-55-130804): genotype concordance + heterozygosity rate;
     102~~6734 (RS-502-130804): genotype concordance + heterozygosity rate;~~ (passed QC in the first run data)[[BR]]
     104Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]]
     107= Data QC based on median correlations of gene counts from each sample to all other samples =
     108[[BR]] Samples with much lower median correlations to all other samples [[BR]] For methods see: [[BR]] AC1JV9ACXX.1.10        0.0471[[BR]] AD1NE2ACXX.5.22        0.1174[[BR]] AD2D8RACXX.3.3        0.8028[[BR]] AD2D8RACXX.6.3        0.8093[[BR]] AD2D8RACXX.1.3        0.8257[[BR]] [[BR]]
     110= Outliers to be removed based on QC stats and PC analysis =
     111[[BR]] Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] Too few reads: [[BR]] AC1JV9ACXX-1-10[[BR]] AD1NE2ACXX-5-22[[BR]] BD1NW4ACXX-3-27[[BR]] Other reasons: See [[BR]] BD1NYRACXX-6-10        too low percentage of mapped reads, outlier on principal component 1,4,5,6[[BR]] AD2CJPACXX-8-9        low exon correlation, outlier on principal component 1,11,14[[BR]] BD1NR9ACXX-7-27        low percentage of mapped reads, outlier on principal component 4, likely degraded[[BR]]
     113= Outliers to be removed based on gender-specific expression analysis =
     114Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] The normalized gene expression values (edgeR TMM method, expressed cpm) for XIST and for the sum of all protein-coding Y-chromosomal genes was used to check for contaminations between samples with different gender. The script can be found [raw-attachment:gender_analysis.r here]. In addition to sample LL AD1GWFACXX-4-15, the following samples (all from LLS) came up and appeared to be contaminated:[[BR]] BC1KBKACXX-5-1[[BR]] BC1KBKACXX-5-3[[BR]] BC1KBKACXX-5-4[[BR]] BC1KBKACXX-5-6[[BR]] BC1KBKACXX-5-8[[BR]] BD1NW4ACXX-8-5[[BR]] BD1NYRACXX-2-16[[BR]] BD1NYRACXX-2-27[[BR]] BD1NYRACXX-4-19[[BR]]