Changes between Version 13 and Version 14 of GoNL_Immunochip_Data_Preparation

Jul 5, 2011 12:30:39 PM (12 years ago)



  • GoNL_Immunochip_Data_Preparation

    v13 v14  
    44This page describes the necessary steps to get a VCF Hg19 file containing the GoNL Immunochip data from the raw/QC'ed Immunochip data in PED format. This is using tools as available in early 2011 and should get much simpler when PLINK/Seq is released.
    6 Here, the procedure is shown for a FORWARD strand PED file. If you have a TOP/TOP PED file, you will still need to correct for strand.
     6Here, the procedure is shown for a FORWARD strand PED file. If you have a TOP/TOP PED file, you will not be able to include strand-ambiguous SNPs (A/T, C/G) in your output.
    88= PED to VCF =
    1616This pre-compiled version can only be run on Linux64 machines and some dependency problems may occur.
    1718== Correct the initial VCF file ==
    1819The initial VCF file produced by PLINK 1.08 does contain the right information, however it is not actually in VCF format. The problem here is that PLINK files specify the Ref/Alt alleles relative to the dataset where VCF specifies the Ref/Alt alleles relative to the Human Genome Reference it is aligned on. To correct this VCF file, it is necessary to modify the initial VCF file so that the alleles are relative to the Human Genome Reference and not the dataset anymore.
    2930 * fastaFromBed -fi reference.fa -bed in.bed -fo -tab
    3032=== Re-arrange Ref/Alt alleles based on the Human Genome Reference ===
    3133Now that we have the Human Reference Genome loci, it is trivial to re-arrange the alleles so that the Ref and Alt alleles correspond to the Human Genome Reference. I wrote a small script, '''' that does the work provided the correct input. Note that when flipping the order of alleles in the VCF Ref/Alt columns, one must also flipped the genotypes correctly.
     35If using '''', one also has the option to flip the strand where relevant if the data was not all on forward strand. Note that by doing so, all strand-ambiguous (A/T,C/G) will be removed.
    3337=== [Optional] Update SNP IDs ===
    5963 1. Remove all SNPs that are not present in the new reference VCF file (using plink --extract)
    6064 1. Use the liftover VCF as an input to the '''' tool .
    6166== Downloads ==
    6267All tools developed inhouse at UMCG are available here: [[]]