Test imputation pipeline – bbmri

Context Navigation

Version 6 (modified by a.kanterakis, 14 years ago) (diff)
--

Introduction

The purpose if this run is to test the efficiency of the existing imputation pipelines in the Grid.

Datasets

Reference

The reference dataset has been created from the raw VCF data of 1000 Genomes data.

Download VCF files from : ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521
Export only the SNPs (filter out the indels and SVs) from VCF data by using vcftools and convert to impute2 format (hap and legend format).
```
vcftools \
--gzvcf ALL.chr1.phase1_release_v2.20101123.snps_indels_svs.vcf.gz \
--keep-INFO LCSNP --keep-INFO EXSNP --keep-INFO SNP \
--IMPUTE \
--out ALL.chr1.phase1_release_v2.20101123.snps_indels_svs.
```
- Alternatively we coud have used the 1000 Genomes reference panel in impute2 format (legend and hap files) from the impute2 website: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download_reference_data

Study panel

The study panel is an artificial genotype dataset. The dataset contains the SNPs set of the Illumina Hap550 platform. To generate it we followed the following steps:

Download the genetic map of b37 release of human genome from impute2: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download_reference_data
Download and install hapgen2: https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html
Download the list of SNPs in the Hap550 platform:

The study panel is an artificial genotype dataset. Created by

Download in other formats:

Plain Text