Changes between Version 9 and Version 10 of DataConcordance


Ignore:
Timestamp:
Apr 21, 2011 4:31:15 PM (14 years ago)
Author:
laurent
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DataConcordance

    v9 v10  
    6565Below is a chart showing the shared and unique SNPs in the two datasets regardless of their genotypes. As expected, the vast majority of the SNPs are shared between the datasets, a relatively high number of SNPs are only found in Groningen (amongst them a majority of unfiltered false positives) and a small number of SNPs unique to the BGI dataset (to be investigated).
    6666
    67 [[Image(bgi_groningen_loci_concordance.jpg)]]
     67[[Image(bgi.snps.comparison.jpg)]]
    6868
    6969After investigation, the three least concordant individuals encountered a problem while processing one of their lanes, thus leading to 2/3 of the normal coverage. The figures should be updated when the lanes have been processed and these individuals corrected.
     
    7272The following chart shows the genotype concordance on the shared SNPs between BGI and Groningen datasets.
    7373
    74 [[Image(bgi_groningen_concordance.jpg)]]
     74[[Image(pilot.bgi.nosex.concordance.jpg)]]
    7575
    7676Note: The chart above does not take sex chromosomes into account as an artifact introduced by the way the Y-chrom was mapped by BGI was showing all males as completely discordant over the sex chromosomes.
     
    9090The following chart shows the genotype concordance on the 165K Immunochip loci left after QC.
    9191
    92 [[Image(groningen_immunochip_concordance.jpg)]]
     92[[Image(pilot.immuno_seq.concordance.v2.jpg)]]
    9393
    9494The 5 least concordant individuals can be explained as follow:
     
    101101The graph below shows a preliminary analysis of the "types" of discordance observed. An important caveat has to be taken into account: VCFTools only reports sites where the alleles perfectly match. This means that all monomorphic sites in one dataset that are polymorphic in the other will not appear. This was especially problematic since we compared each sequenced sample separately against the whole Immunochip dataset. As a result almost all homozygous reference sites in the sequence data were not reported by VCFTools. All the discordant sites that did not have perfectly matching alleles are reported below as 'unknown' as it has yet to be investigated what discordance "type" they belong to.
    102102
    103 [[Image(groningen_immunochip_discordance_matrix.jpg)]
     103[[Image(pilot.immuno.seq.gen.concordance.test.jpg)]
    104104
    105105== BGI / Immunochip ==