| 1 | = Data QC based on mix-up mapping and concordance of imputed genotypes with genotypes called from RNAseq data = |
| 2 | We used 3 ways of doing the QC: |
| 3 | |
| 4 | 1. mix-up mapper: matching genotypes with expression for each sample; |
| 5 | 1. genotype concordance: calculating the concordance of imputed genotypes with genotypes called from RNAseq data; |
| 6 | 1. heterozygosity rate. |
| 7 | |
| 8 | The blacklist of samples that do not pass these quality checks can be found in the attachment. |
| 9 | |
| 10 | === LLS === |
| 11 | MixupMapper detected 5 swaps and 7 samples with wrong genotype. The swaps will be performed and the 7 genotypes replaced. This leaves 7 samples (pers_id: 2014, 3142, 3144, 2634, 2890, 3126 and 3150) without genotype, these should be removed. |
| 12 | |
| 13 | ||=geno_id =||=run_id =||=Best Match (geno_id) =||=Best Match (run_id) =||= Action =|| |
| 14 | || 561 ||BD2CPRACXX-1-12 || 563 ||BD2CPRACXX-1-21 || swap || |
| 15 | || 563 ||BD2CPRACXX-1-21 || 561 ||BD2CPRACXX-1-12 || swap || |
| 16 | || 974 ||BD24PGACXX-8-25 || 978 ||BD24PGACXX-7-8 || swap || |
| 17 | || 978 ||BD24PGACXX-7-8 || 974 ||BD24PGACXX-8-25 || swap || |
| 18 | ||1841 ||AD2CJPACXX-6-9 ||1842 ||AD2CJPACXX-5-1 || swap || |
| 19 | ||1842 ||AD2CJPACXX-5-1 ||1841 ||AD2CJPACXX-6-9 || swap || |
| 20 | ||2585 ||AD2DATACXX-3-21 ||3273 ||AD2DATACXX-3-22 || swap || |
| 21 | ||3273 ||AD2DATACXX-3-22 ||2585 ||AD2DATACXX-3-21 || swap || |
| 22 | ||3411 ||BD2D5MACXX-3-7 ||3413 ||BD2D5MACXX-4-15 || swap || |
| 23 | ||3413 ||BD2D5MACXX-4-15 ||3411 ||BD2D5MACXX-3-7 || swap || |
| 24 | ||2928 ||AD2DATACXX-8-1 ||2014 ||BD2CPRACXX-1-22 || replace genotype || |
| 25 | ||3126 ||AD1NFNACXX-8-25 ||3142 ||BD1NYRACXX-2-15 || replace genotype || |
| 26 | ||3142 ||BD1NYRACXX-2-15 ||3144 ||AD1NAMACXX-7-19 || replace genotype || |
| 27 | ||3194 ||AD2DATACXX-4-5 ||2634 ||AD2DATACXX-4-9 || replace genotype || |
| 28 | || 311 ||BD1NW4ACXX-7-13 ||2890 ||BD1NYRACXX-5-23 || replace genotype || |
| 29 | || 905 ||AD1NFNACXX-8-27 ||3126 ||AD1NFNACXX-8-25 || replace genotype || |
| 30 | ||6039 ||AD1NE2ACXX-5-22 ||3150 ||BD24PGACXX-5-5 || replace genotype || |
| 31 | |
| 32 | |
| 33 | ==== Possibly contaminated samples ==== |
| 34 | |
| 35 | The outliers that show high heterozygosity rate in genotypes called from RNA-seq. [[BR]][[BR]] |
| 36 | Also present in gender-specific analysis (see below):[[BR]] |
| 37 | BC1KBKACXX-5-6[[BR]] |
| 38 | BD1NW4ACXX-8-5[[BR]] |
| 39 | BD1NYRACXX-2-16[[BR]] |
| 40 | BC1KBKACXX-5-3[[BR]] |
| 41 | BC1KBKACXX-5-1[[BR]] |
| 42 | BD1NYRACXX-2-27[[BR]] |
| 43 | BD1NYRACXX-4-19[[BR]] |
| 44 | BC1KBKACXX-5-4[[BR]][[BR]] |
| 45 | Possible gender-neutral contaminations:[[BR]] |
| 46 | BC1KBKACXX-3-12[[BR]] |
| 47 | BC1KBKACXX-5-7[[BR]] |
| 48 | BD24PGACXX-7-10[[BR]] |
| 49 | BC1KBKACXX-5-5[[BR]] |
| 50 | BD1NYRACXX-3-1[[BR]] |
| 51 | BC1KBKACXX-5-2[[BR]] |
| 52 | AD1NFNACXX-4-8[[BR]] |
| 53 | BD1NYRACXX-2-18[[BR]] |
| 54 | === === |
| 55 | === LifeLines === |
| 56 | http://www.molgenis.org/wiki/DeepNoteworthyObservations |
| 57 | |
| 58 | LLDeep_0063 |
| 59 | |
| 60 | Corresponding RNA-seq sample is AC1C40ACXX-4-4 (old id: 103001429206) has only 76% of reads aligned. Flagged by MixupMapper? as sample mix-up. Also shows many discordant genotypes when using SNVMix. |
| 61 | |
| 62 | |
| 63 | LLDeep_0350 |
| 64 | |
| 65 | Corresponding RNA-seq sample is AD1GWFACXX-4-15 (old id: 103001383279), not flagged by MixupMapper?. However, shows many discordant genotypes when using SNVMix. |
| 66 | |
| 67 | Has both high XIST and high chromosome Y expression levels. Average heteryzygosity for all samples = 49%, stdev = 1.9%. Sample LLDeep_0350, 103001383279 has heterozygosity rate of 72%: contaminated sample, where a male and female sample have likely been mixed in very similar proportions, hence the high expression levels of both XIST and chromosome Y genes.[[BR]] |
| 68 | [[BR]] |
| 69 | Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]] |
| 70 | |
| 71 | |
| 72 | === CODAM === |
| 73 | '''eQTL mapping (gene level) results:''' |
| 74 | |
| 75 | 6804 unique cis-regulated genes.[[BR]][[BR]] |
| 76 | |
| 77 | '''Samples that failed the QC:''' |
| 78 | |
| 79 | 2345 (RNA-seq ids: AD10W1ACXX-8-11, CODAM-102-130804): mix-up mapper + genotype concordance; |
| 80 | |
| 81 | 2495 (RNA-seq ids: AD10W1ACXX-5-18, CODAM-156-130804): mix-up mapper + genotype concordance; |
| 82 | |
| 83 | It looks like RNA-seq sample ids were swapped for these two samples (see: http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun1 of 12-December-2013)[[BR]] |
| 84 | |
| 85 | Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]] |
| 86 | |
| 87 | === RS === |
| 88 | '''eQTL mapping (gene level) results:''' |
| 89 | |
| 90 | 7708 unique cis-regulated genes.[[BR]][[BR]] |
| 91 | |
| 92 | '''Samples that failed the QC:''' |
| 93 | |
| 94 | 8190002 (RNA-seq ids: AD1NNNACXX-4-18, RS-287-130804): mix-up mapper + genotype concordance; |
| 95 | |
| 96 | 9353 (AC1JV9ACXX-1-13, RS-761-130804): mix-up mapper + genotype concordance; |
| 97 | |
| 98 | 3520 (BC1JTJACXX-6-7, RS-442-130804): genotype concordance; |
| 99 | |
| 100 | 562 (BC1KAVACXX-8-13, RS-55-130804): genotype concordance + heterozygosity rate; |
| 101 | |
| 102 | ~~6734 (RS-502-130804): genotype concordance + heterozygosity rate;~~ (passed QC in the first run data)[[BR]] |
| 103 | |
| 104 | Link to file with genotype concordance and heterozygosity rates on imputed genotpyes can be found [raw-attachment:genotype_concordance_heterozygosity_rate_imputed_RS_CODAM_LLS.xlsx here] [[BR]] |
| 105 | [[BR]] |
| 106 | |
| 107 | = Data QC based on median correlations of gene counts from each sample to all other samples = |
| 108 | [[BR]] Samples with much lower median correlations to all other samples [[BR]] For methods see: http://www.bbmriwiki.nl/wiki/gene_exon_transcript_count [[BR]] AC1JV9ACXX.1.10 0.0471[[BR]] AD1NE2ACXX.5.22 0.1174[[BR]] AD2D8RACXX.3.3 0.8028[[BR]] AD2D8RACXX.6.3 0.8093[[BR]] AD2D8RACXX.1.3 0.8257[[BR]] [[BR]] |
| 109 | |
| 110 | = Outliers to be removed based on QC stats and PC analysis = |
| 111 | [[BR]] Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] Too few reads: [[BR]] AC1JV9ACXX-1-10[[BR]] AD1NE2ACXX-5-22[[BR]] BD1NW4ACXX-3-27[[BR]] Other reasons: See http://www.bbmriwiki.nl/wiki/BIOS_QualityControl/BIOS_QualityControlRun [[BR]] BD1NYRACXX-6-10 too low percentage of mapped reads, outlier on principal component 1,4,5,6[[BR]] AD2CJPACXX-8-9 low exon correlation, outlier on principal component 1,11,14[[BR]] BD1NR9ACXX-7-27 low percentage of mapped reads, outlier on principal component 4, likely degraded[[BR]] |
| 112 | |
| 113 | = Outliers to be removed based on gender-specific expression analysis = |
| 114 | Updated: 12-December-2013[[BR]] Analysis by: Peter-Bram 't Hoen[[BR]] The normalized gene expression values (edgeR TMM method, expressed cpm) for XIST and for the sum of all protein-coding Y-chromosomal genes was used to check for contaminations between samples with different gender. The script can be found [raw-attachment:gender_analysis.r here]. In addition to sample LL AD1GWFACXX-4-15, the following samples (all from LLS) came up and appeared to be contaminated:[[BR]] BC1KBKACXX-5-1[[BR]] BC1KBKACXX-5-3[[BR]] BC1KBKACXX-5-4[[BR]] BC1KBKACXX-5-6[[BR]] BC1KBKACXX-5-8[[BR]] BD1NW4ACXX-8-5[[BR]] BD1NYRACXX-2-16[[BR]] BD1NYRACXX-2-27[[BR]] BD1NYRACXX-4-19[[BR]] |