Changes between Version 9 and Version 10 of GoNL_Immunochip_Data_Preparation
- Timestamp:
- Jul 1, 2011 4:14:23 PM (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GoNL_Immunochip_Data_Preparation
v9 v10 23 23 24 24 ''fastaFromBed'' needs a [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 UCSC BED] file as input. This file is tab-delimited and contains 3 columns: Chrom Start_seq End_seq. As we are only interested in specific loci, Start_seq and End_seq will be 1 base appart so that only the locus of interest is reported in the output file. This file can very easily be generated either from the initial VCF file or the PLINK BIM file: 25 * From VCF: grep -v '^#' in.vcf | awk '{OFS="\t";print $1,$2,$2+1}' > out.bed 25 26 * From VCF: grep -v '!^#' in.vcf | awk '{OFS="\t";print $1,$2,$2+1}' > out.bed 26 27 27 28 Once you have the input file, simply run ''fastaFromBed'' on it giving the Human Reference corresponding to the chip data as the other input. For more information on ''fastaFromBed'', see the [http://code.google.com/p/bedtools/ BEDTools] Manual. 28 29 29 === Re-arrange Ref/Alt alleles based on the Human Genome Reference === 30 30 Now that we have the Human Reference Genome loci, it is trivial to re-arrange the alleles so that the Ref and Alt alleles correspond to the Human Genome Reference. I wrote a small script, ''align-vcf-to-ref.pl'' that does the work provided the correct input. Note that when flipping the order of alleles in the VCF Ref/Alt columns, one must also flipped the genotypes correctly.