Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of BIOS_SampleBlacklist

Timestamp:: Sep 19, 2016 5:15:17 PM (9 years ago)
Author:: jamverlouw
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

BIOS_SampleBlacklist

                       v1
+= Data QC based on mix-up mapping and concordance of imputed genotypes with genotypes called from RNAseq data =
+We used 3 ways of doing the QC:
+. mix-up mapper: matching genotypes with expression for each sample;
+. genotype concordance: calculating the concordance of imputed genotypes with genotypes called from RNAseq data;
+. heterozygosity rate.
+The blacklist of samples that do not pass these quality checks can be found in the attachment.
+=== LLS ===
+MixupMapper detected 5 swaps and 7 samples with wrong genotype. The swaps will be performed and the 7 genotypes replaced. This leaves 7 samples (pers_id: 2014, 3142, 3144, 2634, 2890, 3126 and 3150) without genotype, these should be removed.
+||=geno_id =||=run_id =||=Best Match (geno_id) =||=Best Match (run_id) =||= Action =||
+|| 561 ||BD2CPRACXX-1-12 || 563 ||BD2CPRACXX-1-21 || swap ||
+|| 563 ||BD2CPRACXX-1-21 || 561 ||BD2CPRACXX-1-12 || swap ||
+|| 974 ||BD24PGACXX-8-25 || 978 ||BD24PGACXX-7-8 || swap ||
+|| 978 ||BD24PGACXX-7-8 || 974 ||BD24PGACXX-8-25 || swap ||
+||1841 ||AD2CJPACXX-6-9 ||1842 ||AD2CJPACXX-5-1 || swap ||
+||1842 ||AD2CJPACXX-5-1 ||1841 ||AD2CJPACXX-6-9 || swap ||
+||2585 ||AD2DATACXX-3-21 ||3273 ||AD2DATACXX-3-22 || swap ||
+||3273 ||AD2DATACXX-3-22 ||2585 ||AD2DATACXX-3-21 || swap ||
+||3411 ||BD2D5MACXX-3-7 ||3413 ||BD2D5MACXX-4-15 || swap ||
+||3413 ||BD2D5MACXX-4-15 ||3411 ||BD2D5MACXX-3-7 || swap ||
+||2928 ||AD2DATACXX-8-1 ||2014 ||BD2CPRACXX-1-22 || replace genotype ||
+||3126 ||AD1NFNACXX-8-25 ||3142 ||BD1NYRACXX-2-15 || replace genotype ||
+||3142 ||BD1NYRACXX-2-15 ||3144 ||AD1NAMACXX-7-19 || replace genotype ||
+||3194 ||AD2DATACXX-4-5 ||2634 ||AD2DATACXX-4-9 || replace genotype ||
+|| 311 ||BD1NW4ACXX-7-13 ||2890 ||BD1NYRACXX-5-23 || replace genotype ||
+|| 905 ||AD1NFNACXX-8-27 ||3126 ||AD1NFNACXX-8-25 || replace genotype ||
+||6039 ||AD1NE2ACXX-5-22 ||3150 ||BD24PGACXX-5-5 || replace genotype ||
+==== Possibly contaminated samples ====
+The outliers that show high heterozygosity rate in genotypes called from RNA-seq. [[BR]][[BR]]
+Also present in gender-specific analysis (see below):[[BR]]
+BC1KBKACXX-5-6[[BR]]
+BD1NW4ACXX-8-5[[BR]]
+BD1NYRACXX-2-16[[BR]]
+BC1KBKACXX-5-3[[BR]]
+BC1KBKACXX-5-1[[BR]]
+BD1NYRACXX-2-27[[BR]]
+BD1NYRACXX-4-19[[BR]]
+BC1KBKACXX-5-4[[BR]][[BR]]
+Possible gender-neutral contaminations:[[BR]]
+BC1KBKACXX-3-12[[BR]]
+BC1KBKACXX-5-7[[BR]]
+BD24PGACXX-7-10[[BR]]
+BC1KBKACXX-5-5[[BR]]
+BD1NYRACXX-3-1[[BR]]
+BC1KBKACXX-5-2[[BR]]
+AD1NFNACXX-4-8[[BR]]
+BD1NYRACXX-2-18[[BR]]
+===  ===
+=== LifeLines ===
+http://www.molgenis.org/wiki/DeepNoteworthyObservations
+LLDeep_0063
+Corresponding RNA-seq sample is AC1C40ACXX-4-4 (old id: 103001429206) has only 76% of reads aligned. Flagged by MixupMapper? as sample mix-up. Also shows many discordant genotypes when using SNVMix.
+LLDeep_0350
+Corresponding RNA-seq sample is AD1GWFACXX-4-15 (old id: 103001383279), not flagged by MixupMapper?. However, shows many discordant genotypes when using SNVMix.
+Has both high XIST and high chromosome Y expression levels. Average heteryzygosity for all samples = 49%, stdev = 1.9%. Sample LLDeep_0350, 103001383279 has heterozygosity rate of 72%: contaminated sample, where a male and female sample have likely been mixed in very similar proportions, hence the high expression levels of both XIST and chromosome Y genes.[[BR]]
+[[BR]]
+Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]]
+=== CODAM ===
+'''eQTL mapping (gene level) results:'''
+unique cis-regulated genes.[[BR]][[BR]]
+'''Samples that failed the QC:'''
+(RNA-seq ids: AD10W1ACXX-8-11, CODAM-102-130804): mix-up mapper + genotype concordance;
+(RNA-seq ids: AD10W1ACXX-5-18, CODAM-156-130804): mix-up mapper + genotype concordance;
+It looks like RNA-seq sample ids were swapped for these two samples (see: http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun1 of 12-December-2013)[[BR]]
+Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]]
+=== RS ===
+'''eQTL mapping (gene level) results:'''
+unique cis-regulated genes.[[BR]][[BR]]
+'''Samples that failed the QC:'''
+8190002 (RNA-seq ids: AD1NNNACXX-4-18, RS-287-130804): mix-up mapper + genotype concordance;
+(AC1JV9ACXX-1-13, RS-761-130804): mix-up mapper + genotype concordance;
+(BC1JTJACXX-6-7, RS-442-130804): genotype concordance;
+(BC1KAVACXX-8-13, RS-55-130804): genotype concordance + heterozygosity rate;
+~~6734 (RS-502-130804): genotype concordance + heterozygosity rate;~~ (passed QC in the first run data)[[BR]]
+Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found  [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]]
+[[BR]]
+= Data QC based on median correlations of gene counts from each sample to all other samples =
+[[BR]] Samples with much lower median correlations to all other samples [[BR]] For methods see: http://www.bbmriwiki.nl/wiki/gene_exon_transcript_count [[BR]] AC1JV9ACXX.1.10        0.0471[[BR]] AD1NE2ACXX.5.22        0.1174[[BR]] AD2D8RACXX.3.3        0.8028[[BR]] AD2D8RACXX.6.3        0.8093[[BR]] AD2D8RACXX.1.3        0.8257[[BR]] [[BR]]
+= Outliers to be removed based on QC stats and PC analysis =
+[[BR]] Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] Too few reads: [[BR]] AC1JV9ACXX-1-10[[BR]] AD1NE2ACXX-5-22[[BR]] BD1NW4ACXX-3-27[[BR]] Other reasons: See http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun [[BR]] BD1NYRACXX-6-10        too low percentage of mapped reads, outlier on principal component 1,4,5,6[[BR]] AD2CJPACXX-8-9        low exon correlation, outlier on principal component 1,11,14[[BR]] BD1NR9ACXX-7-27        low percentage of mapped reads, outlier on principal component 4, likely degraded[[BR]]
+= Outliers to be removed based on gender-specific expression analysis =
+Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] The normalized gene expression values (edgeR TMM method, expressed cpm) for XIST and for the sum of all protein-coding Y-chromosomal genes was used to check for contaminations between samples with different gender. The script can be found [raw-attachment:gender_analysis.r here]. In addition to sample LL AD1GWFACXX-4-15, the following samples (all from LLS) came up and appeared to be contaminated:[[BR]] BC1KBKACXX-5-1[[BR]] BC1KBKACXX-5-3[[BR]] BC1KBKACXX-5-4[[BR]] BC1KBKACXX-5-6[[BR]] BC1KBKACXX-5-8[[BR]] BD1NW4ACXX-8-5[[BR]] BD1NYRACXX-2-16[[BR]] BD1NYRACXX-2-27[[BR]] BD1NYRACXX-4-19[[BR]]