Version 1 (modified by 8 years ago) (diff) | ,
---|
List of flagged samples run 1
Author: Peter-Bram 't Hoen
Date: 5-February-2014
There are three fields in the metadatabase to flag samples that should not be used in the final analysis: QC of the sequencing, genotype concordance and contamination. Here we report the final list of samples which are flagged in the metadatabase.
QC flag
These three samples are flagged because of too low number of reads:
Too few reads:
AC1JV9ACXX-1-10
AD1NE2ACXX-5-22
BD1NW4ACXX-3-27
Other reasons: See http://www.bbmriwiki.nl/wiki/FgQualityControl/FgQualityControlRun1
BD1NYRACXX-6-10 too low percentage of mapped reads, outlier on principal component 1,4,5,6
AD2CJPACXX-8-9 low exon correlation, outlier on principal component 1,11,14
BD1NR9ACXX-7-27 low percentage of mapped reads, outlier on principal component 4, likely degraded
Genotype concordance
This is the list of the samples from the different cohorts that have a mismatch between RNA and DNA genotypes that could not be corrected since there is no additional lab evidence suggesting a sample swap. For more information, see http://www.bbmriwiki.nl/wiki/FgSampleBlacklist
CODAM:
AD10W1ACXX-5-18
AD10W1ACXX-8-11
LL:
AC1C40ACXX-4-4
AD1GWFACXX-4-15
RS:
AD1NNNACXX-4-18
AC1JV9ACXX-1-13
BC1JTJACXX-6-7
BC1KAVACXX-8-13
LLS:
BD1NW4ACXX-7-13
BD1NYRACXX-2-27
BC1KBKACXX-5-8
AD1NAMACXX-7-19
BC1KBKACXX-5-6
AD2DATACXX-8-1
BD2D5MACXX-4-15
BD2D5MACXX-3-7
AD2DATACXX-3-21
AD2DATACXX-4-5
AD1NFNACXX-8-25
BD24PGACXX-7-8
BD2CPRACXX-1-12
AD2DATACXX-3-22
BD24PGACXX-7-10
BD2CPRACXX-1-21
BC1KBKACXX-5-3
AD1NFNACXX-8-27
BD24PGACXX-8-25
BC1KBKACXX-5-7
BD24PGACXX-6-12
BD1NW4ACXX-8-5
BC1KBKACXX-3-12
BC1KBKACXX-5-1
AD2CJPACXX-6-9
BD1NYRACXX-2-16
AD2CJPACXX-5-1
AD1NE2ACXX-4-22
BC1KBKACXX-5-2
BC1KBKACXX-5-5
BD1NYRACXX-3-1
BD2D5MACXX-2-25
BD1NYRACXX-4-19
AD1NFNACXX-4-8
BD1NYRACXX-2-15
BC1KBKACXX-5-4
AD1NFNACXX-5-15
Contaminations
No contaminations have been detected for CODAM, RS, LL. Only outlier samles in heterozygosity rates have already been spotted as mixups. See: http://www.bbmriwiki.nl/wiki/FgSampleBlacklist
The list of LLS samples that have high heterozygosity rate based on genotypes called from RNAseq data (all of them have low genotype concordance - see the list above):
BD1NYRACXX-2-27
BC1KBKACXX-5-8
BC1KBKACXX-5-6
BD24PGACXX-7-10
BC1KBKACXX-5-3
BC1KBKACXX-5-7
BD1NW4ACXX-8-5
BC1KBKACXX-5-1
BD1NYRACXX-2-16
AD1NE2ACXX-4-22
BC1KBKACXX-5-2
BC1KBKACXX-5-5
BD1NYRACXX-3-1
BD1NYRACXX-4-19
AD1NFNACXX-4-8
BC1KBKACXX-5-4
AD1NFNACXX-5-15
Overview RNAseq data April 14th 2014
Table I: Overview of the RNAseq data.
Biobank |
Person Id |
RNAseq run Id |
Data part of RP31 |
Passed QC |
CODAM |
192 |
191 |
191 |
188 |
LL |
761 |
630 |
630 |
628 |
LLS |
821 |
720 |
697(23)2 |
664 |
NTR |
3300 |
- |
- |
- |
RS |
768 |
658 |
658 |
652 |
Total:
|
5842 |
2199 |
2176 |
2132 |
1All these samples are also part of the data freeze 1. 223 samples are not part of RP3.
There are 29 merged runs in the passed qc rnaseq data: 2 for CODAM, 17 for LL, 4 for LLS and 6 for RS.
Table II: Detail of samples that did not passed quality control.
Biobank |
Not Passed QC |
Bad quality |
Genotype Discordance |
Contaminated |
CODAM |
3 |
1 |
2 |
- |
LL |
2 |
- |
2 |
- |
LLS |
341 |
3 |
322 |
17 |
RS |
6 |
2 |
4 |
- |
Total: |
45 |
6 |
40 |
17 |
1One sample not part of RP3. 2Three LLS swaps identified by MixupMapper have been corrected. Additional evidence confirmed that these LLS samples were indeed swapped. These six samples are not included in the Genotype Discordance counts for LLS.
Table III: Overview genotype data (GoNL version 5 imputed)
Biobank |
Person Id |
Genotype1 |
Overlapping with Passed QC RNAseq data |
CODAM |
192 |
188 (4)2 |
184 (4)3 |
LL |
761 |
756 (5) |
626 (2)4 |
LLS |
821 |
765 (56) |
654 (10)5 |
NTR |
3300 |
1886 (1414) |
- |
RS |
768 |
768 |
652 |
Total:
|
5842 |
4363 (1479) |
2116 (16) |
1Genotype information was extracted from the biobank genotype sample information files GoNL version 5 imputed data, directly extracted from the SRM. 2Numbers between brackets indicated the differences between number of person ids and genotype data available. 3Missing genotypes for: CODAM-2037, CODAM-2074, CODAM-2175 and CODAM-2555. 4Missing genotypes for: LL-LLDeep_0127 and LL-LLDeep_0519. 5These 10 samples are GoNL samples for which no genotype data was provided.
LLS update
For LLS the detected sample swaps were swapped back. The mixup mapper and genotype concordance check performed using the final sample sheets show no mixups and no strong outliers.