Version 6 (modified by 13 years ago) (diff) | ,
---|
Introduction ¶
The purpose if this run is to test the efficiency of the existing imputation pipelines in the Grid.
Datasets ¶
Reference ¶
The reference dataset has been created from the raw VCF data of 1000 Genomes data.
- Download VCF files from : ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521
- Export only the SNPs (filter out the indels and SVs) from VCF data by using vcftools and convert to impute2 format (hap and legend format).
vcftools \ --gzvcf ALL.chr1.phase1_release_v2.20101123.snps_indels_svs.vcf.gz \ --keep-INFO LCSNP --keep-INFO EXSNP --keep-INFO SNP \ --IMPUTE \ --out ALL.chr1.phase1_release_v2.20101123.snps_indels_svs.
- Alternatively we coud have used the 1000 Genomes reference panel in impute2 format (legend and hap files) from the impute2 website: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download_reference_data
Study panel ¶
The study panel is an artificial genotype dataset. The dataset contains the SNPs set of the Illumina Hap550 platform. To generate it we followed the following steps:
- Download the genetic map of b37 release of human genome from impute2: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download_reference_data
- Download and install hapgen2: https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html
- Download the list of SNPs in the Hap550 platform:
The study panel is an artificial genotype dataset. Created by