Changes between Version 2 and Version 3 of ChipBasedQcPipelineIdea
- Timestamp:
- Sep 26, 2010 7:35:41 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ChipBasedQcPipelineIdea
v2 v3 66 66 Genotypic table (name: VCF_genotypes_yyyy.mm.dd.txt). Tab-delimited file containing following information. Header line: 67 67 68 ID SNPV GTVCF GQ DP BATCH ????68 ID SNPV GTVCF GQ DP ... 69 69 70 70 Next lines should all contain XXX tab-delimited values. Use “.” (dot) for missing. … … 73 73 * GTVCF: genotype with alleles in alphabetic order, <two characters, each either “A”, “C”, “G” or “T”>. This can be done by mapping the numbers provided in VCF GT field to REF and ALT and then ordering. 74 74 * GQ, DP: directly from VCF file 75 * BATCH … 76 77 HERE WE NEED TO DECIDE WHAT POTENTIALLY QUALITY-AFFECTING VARIABLES (SUCH AS BATCH) WE NEED TO TAKE INTO ACCOUNT 75 * …: factors potentially associated with quality of the sequencing data, summarized in FactorsRelatedToSeqDataQuality. 78 76 79 77 Merge chip and VCF genotypic tables (“chip_genotypes_yyyy.mm.dd.txt” and “VCF_genotypes_yyyy.mm.dd.txt”) using ID and SNPV as key variables. Keep all chip genotypes, substituting missing (“.”) when no information is available from VCF. Name the table “merged_chip_and_VCF_genotypes_yyy.mm.dd.txt”.