Changes between Version 23 and Version 24 of ImputationPipeline


Ignore:
Timestamp:
Dec 1, 2011 11:52:06 AM (12 years ago)
Author:
a.kanterakis
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ImputationPipeline

    v23 v24  
    1 = Imputation pipeline =
    2 [[TOC()]]
    3 
    4 There are at the moment two collaborating initiatives:
    5 
    6  * VU: Mathijs Kattenberg and Joukejan Hottenga using IMPUTE
    7  * UMCG: Lude Franke, Harm-Jan Westra, George Byelas, Morris Swertz using Beagle
    8 
    9 The objective is to bring these pipelines into the same space so they can be properly compared and optimized.
    10 
    11 TODO: describe the protocols here;
    12 == Description from Harm-Jan ==
    13 
    14 '''Also: ImputationTool'''
    15 
    16 The imputation pipeline has changed, in such a way that it was reduced to only a few steps. To facilitate QC and conversion steps, I've bundled our conversion tools in one single program called ImputationTool.jar. 
    17 
    18 Here, I shortly describe the steps that need to be in the new pipeline, in placeholders I also describe what the commands could look like, if you would implement this in a shellscript (or java program). These examples can be the complete execution steps of the pipeline.
    19 
    20 '' Commands to run locally: ''
    21 === Step 1 ===
    22  * According to Harm-Jan: if the dataset is in binary plink format, use plink —recode to convert back to ped+map
    23  * Transform .bed   ,    .bim   and    .fam   files into ASCII format
    24 {{{
    25 /Users/alexandroskanterakis/Tools/plink/plink-1.07-mac-intel/plink --bfile /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05 --ped /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.ped --map CD_Finnuncorr.maf05.map --recode
    26 
    27 }}}
    28  * Produces the files:
    29 {{{
    30 -rw-r--r--   1 alexandroskanterakis  staff    11761120 Oct  5 12:11 plink.map
    31 -rw-r--r--   1 alexandroskanterakis  staff  4898208052 Oct  5 12:11 plink.ped
    32 }}}
    33 * real Execution time: 26m20.723s
    34 * creates rather big files 4.6G  plink.ped
    35 
    36 
    37 === Step 2 ===
    38 
    39  * Convert dataset to trityper format, if it is in ped+map format.
    40 {{{
    41 java -Xmx4g -jar /Users/alexandroskanterakis/Tools/imputation/ImputationTool/dist/ImputationTool.jar --mode pmtt --in /Users/alexandroskanterakis/Data/Finnish_cohort/ --out /Users/alexandroskanterakis/Data/Finnish_cohort/
    42 
    43 real    74m52.547s
    44 user    38m14.646s
    45 sys     5m25.472s
    46 
    47 }}}
    48  * Files Created:
    49 {{{
    50 -rw-r--r--   1 alexandroskanterakis  staff  2449071024 Oct  5 15:05 GenotypeMatrix.dat
    51 -rw-r--r--   1 alexandroskanterakis  staff       46196 Oct  5 13:54 Individuals.txt
    52 -rw-r--r--   1 alexandroskanterakis  staff       98791 Oct  5 13:54 PhenotypeInformation.txt
    53 -rw-r--r--   1 alexandroskanterakis  staff    10771996 Oct  5 13:54 SNPMappings.txt
    54 -rw-r--r--   1 alexandroskanterakis  staff     5006368 Oct  5 13:54 SNPs.txt
    55 }}}
    56 
    57 === Step 3 ===
    58  * compare the dataset to be imputed to the reference dataset (for example HapMap?2 release 24, also in TriTyper? format), and remove any snps for which the haplotypes are different, or do not correlate to the reference dataset. Also remove any SNP that is not in the reference. Save the output as Ped+Map
    59 {{{
    60 java -Xmx4g -jar /Users/alexandroskanterakis/Tools/imputation/ImputationTool/dist/ImputationTool.jar --mode ttpmh --in /Users/alexandroskanterakis/Data/Finnish_cohort/ --hap /Users/alexandroskanterakis/Data/HapMap2-r24-CEU/ --out /Users/alexandroskanterakis/Data/Finnish_cohort/referenceOutput/
    61 
    62 real    60m53.623s
    63 user    30m35.172s
    64 sys     2m34.325s
    65 }}}
    66 
    67  * Created Files (for each chromosome):
    68 {{{
    69 -rw-r--r--    1 alexandroskanterakis  staff    221184 Oct  5 16:07 chr1.dat
    70 -rw-r--r--    1 alexandroskanterakis  staff   1004358 Oct  5 16:06 chr1.excludedsnps.txt
    71 -rw-r--r--    1 alexandroskanterakis  staff    442368 Oct  5 16:07 chr1.map
    72 -rw-r--r--    1 alexandroskanterakis  staff    441350 Oct  5 16:07 chr1.markersBeagleFormat
    73 -rw-r--r--    1 alexandroskanterakis  staff   5802372 Oct  5 16:07 chr1.ped
    74 -rw-r--r--    1 alexandroskanterakis  staff    117708 Oct  5 16:06 chr1.warningsnps.txt
    75 }}}
    76 === Steps 4-9 ===
    77  * beagle: imputes in batches of samples (imputes the whole genome in subsets of samples)
    78 * impute: imputes all samples at once for subset of a genome. Select a window
    79  4. split the ped files in batches of 300 samples
    80 {{{
    81   * mkdir -p ".$datasetLocation."/batches/
    82   * split -a2 -l$batchSize $pedAndMapOutputLocation $batchOutputLocation
    83 }}}
    84  5. run linkage2beagle to convert the ped and map files to beagle format
    85 {{{
    86 for each batch
    87 do
    88       java -Xmx7g -jar linkage2beagle.jar data=$batchOutputLocation/chr$chromosome.dat pedigree=$batchOutputLocation/chr$chromosome.ped.$batch  beagle=$beagleLocation/chr$chromosome.bgl.$batch
    89 done
    90 }}}
    91 
    92 '' Commands to run in server: ''
    93  6. run the actual imputation on the batches on the cluster (needs hapmap to be recoded to beagle format as well, but I have these files for you)
    94 {{{
    95 for each batch
    96 do
    97         java -Xmx11g -Djava.io.tmpdir=\$TMPDIR -jar beagle.jar unphased=$beagleLocation/chr$chromosome.bgl.$batch phased=$referenceLocation/HM2_Chr$chromosome-BEAGLE markers=$referenceLocation/markers_Chr$chromosome.txt missing=0 out=$outputLocation/Chr$chromosome/chr$chromosome-$batch
    98 done
    99 }}}
    100 
    101 '' Commands to run locally: ''
    102  7. convert the beagle imputed files into trityper format
    103 {{{
    104 java -Xmx4g -jar ImputationTool.jar bttb $outputLocation Chr/ChrCHROMOSOME-BATCH $imputedTriTyperLocation $numSamples   
    105 }}}
    106  8. correlate the imputed snps to the snps in the original dataset
    107 {{{
    108 java -Xmx4g -jar ImputationTool.jar corr $trityperOutputLocation $datasetName $imputedTriTyperLocation $imputedDatasetName
    109 }}}
    110  9. (if needed) convert to other formats (plink dosage / ped+map))
    111 
    112 That's basically it. A lot simpler than the previous version, don't you think? The required tool is attached to this e-mail, but might still be a bit buggish. Any recommendations are therefore more than welcome.
    113 
    114 == IMPUTE2 pipeline ==
    115 impute2 accepts only gen and sample files as input. So we may have to perform some format conversions before running impute2. If our initial datasets are Ped and Map files then we can use the method: ConvertManyPedMapToGenSample to convert it to gen and sample files. If our initial datasets are in Bed/Bim/Fam format then we can use ConvertBedBimFamToPedMap to convert to Ped and Map files and then use ConvertManyPedMapToGenSample to convert to gen and sample files. As soon as you are done with these conversions steps you can use the UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches to perform the imputation. This method has to be run once for each chromosome.
    116 
    117 {{{#!graphviz
    118 
    119 digraph g {
    120 
    121         size="10,10"
    122 
    123         node [shape=box,style=filled,color=white]
    124         "BED/BIM/FAM Files"
    125         "PED/MAP Files"
    126         "PED/MAP CHR1,CHR2,..."
    127         "GEN/Sample CHR1, CHR2,..."
    128         "GEN/Sample Imputed results"
    129 
    130         "Recombination map"
    131         "Known haplotypes"
    132         "Information about the Reference SNPs"
    133 
    134         node [shape=ellipse,color=yellow]
    135         ConvertBedBimFamToPedMap
    136         DividePedMapToChromosomes
    137         ConvertListsOfPedAndMapFilesToGenAndSample
    138         UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches
    139         UseImpute2WithOnePhasedReferencePanel
    140 
    141         subgraph cluster_0 {
    142 
    143                 style=filled;
    144                 color=lightgrey;
    145 
    146                 "BED/BIM/FAM Files" -> ConvertBedBimFamToPedMap -> "PED/MAP Files"
    147                 "PED/MAP Files" ->  DividePedMapToChromosomes ->  "PED/MAP CHR1,CHR2,..."
    148                 "PED/MAP CHR1,CHR2,..." ->  ConvertListsOfPedAndMapFilesToGenAndSample ->  "GEN/Sample CHR1, CHR2,..."
    149 
    150                 label = "Convert Input Files to Gen / Sample format";
    151         }
    152 
    153         subgraph cluster_1 {
    154 
    155                 style=filled;
    156                 color=lightgrey;
    157 
    158                 "Recombination map"
    159                 "Known haplotypes"
    160                 "Information about the Reference SNPs"
    161 
    162                 label = "Reference data"
    163         }
    164 
    165         subgraph cluster_2 {
    166 
    167                 style=filled;
    168                 color=lightgrey;
    169 
    170                 "Recombination map" -> UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches
    171                 "Known haplotypes" -> UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches
    172                 "Information about the Reference SNPs" -> UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches
    173                 "GEN/Sample CHR1, CHR2,..." -> UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches
    174                 UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches -> UseImpute2WithOnePhasedReferencePanel [label="For each batch"]
    175                 UseImpute2WithOnePhasedReferencePanel -> "GEN/Sample Imputed results"
    176 
    177                 label = "Imputation";
    178         }
    179 
    180 }
    181 
    182 }}}
    183 === ConvertBedBimFamToPedMap ===
    184 This script converts BED, BIM, FAM files to PED and MAP by using plink. The "Path to BED, BIM, FAM file" parameter should contain the path and the suffix of these files. For example if the files are:
    185 
    186  * /path/to/FILE1.maf05.bed
    187  * /path/to/FILE1.maf05.bim
    188  * /path/to/FILE1.maf05.fam
    189 
    190 Then the parameter value should be: /'''path/to/FILE1.maf05'''
    191 
    192 ''''''Note: creates rather big plink.ped files
    193 
    194 ==== Parameters ====
    195  * Path to plink executable (example: /Users/alexandroskanterakis/Tools/plink/plink-1.07-mac-intel/plink)
    196  * Path to BED, BIM, FAM file (example: /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05)
    197  * Filename of exported PED file (example: /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.ped)
    198  * Filename of exported MAP file (example: /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.map)
    199 
    200 ==== Example of usage: ====
    201 
    202 {{{
    203 #!div style="font-size: 80%"
    204 Code highlighting:
    205   {{{#!python
    206 ConvertBedBimFamToPedMap(
    207 
    208     plink_path="/Users/alexandroskanterakis/Tools/plink/plink-1.07-mac-intel/plink",
    209     bbf_path="/Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05", 
    210     ped_path="/Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.ped",
    211     map_path="/Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.map")
    212   }}}
    213 }}}
    214 
    215 ==== Source code ====
    216   http://www.bbmriwiki.nl/svn/Imputation/impute2/ConvertBedBimFamToPedMap.py
    217 
    218 === DividePedMapToChromosomes ===
    219 This script divides a pair of PED and MAP files to chromosomes by using plink (http://pngu.mgh.harvard.edu/~purcell/plink/).
    220 
    221 ==== Parameters ====
    222  * path to plink (example: /Users/alexandroskanterakis/Tools/plink/plink-1.07-mac-intel/plink) 
    223  * path to map file (example: /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.map) 
    224  * path to ped file (example: /Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.ped) 
    225  * Directory where files will be exported (example: /Users/alexandroskanterakis/Data/Finnish_cohort/DividedChromosomes) 
    226  * Suffix of the exported files (example: output_) 
    227 
    228 ==== Example ====
    229 
    230 {{{
    231 #!div style="font-size: 80%"
    232 Code highlighting:
    233   {{{#!python
    234 DividePedMapToChromosomes(
    235         plink_path= "/Users/alexandroskanterakis/Tools/plink/plink-1.07-mac-intel/plink",
    236         map_path="/Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.map",
    237         ped_path="/Users/alexandroskanterakis/Data/Finnish_cohort/CD_Finnuncorr.maf05.ped",
    238         output_path="/Users/alexandroskanterakis/Data/Finnish_cohort/DividedChromosomes",
    239         suffix="output_")
    240   }}}
    241 }}}
    242 
    243 ==== Source code ====
    244 http://www.bbmriwiki.nl/svn/Imputation/impute2/DividePedMapToChromosomes.py
    245 
    246 === ConvertListsOfPedAndMapFilesToGenAndSample ===
    247 Converts many Ped and Map genotype files usually used by [http://pngu.mgh.harvard.edu/~purcell/plink/ plink] to gen and sample files usually used by [http://mathgen.stats.ox.ac.uk/impute/impute.html impute], [http://www.stats.ox.ac.uk/~marchini/software/gwas/chiamo.html chiamo], [http://www.stats.ox.ac.uk/~marchini/software/gwas/hapgen.html hapgen] and [http://www.stats.ox.ac.uk/~marchini/software/gwas/snptest.html snptest]. The conversion tool that is used is [http://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html gtool]. All Python lists have to have the same size.
    248 
    249 ==== Parameters ====
    250  * gtoolPath : Path to gtool (example: /Users/alexandroskanterakis/Tools/gtool/gtool)
    251  * pedInputListOfFiles : Python list of input ped files
    252  * mapInputListOfFiles : Python list of input map files
    253  * sampleOutputListOfFiles : Python list of output sample files
    254  * genOutputListOfFiles : Python list of output gen files
    255 
    256 ==== Example ====
    257 
    258 {{{
    259 #!div style="font-size: 80%"
    260 Code highlighting:
    261   {{{#!python
    262 output_divided_path = "/path/to/output/files"
    263 suffix = "_chr"
    264 
    265 ConvertListsOfPedAndMapFilesToGenAndSample(
    266                 gtoolPath = "/Users/alexandroskanterakis/Tools/gtool/gtool",
    267                 pedInputListOfFiles= [output_divided_path+ "/" + suffix + "_" + str(x) + ".ped" for x in range(1,22)+['X', 'Y']],
    268                 mapInputListOfFiles=[output_divided_path+ "/" + suffix + "_" + str(x) + ".map" for x in range(1,22)+['X', 'Y']],
    269                 sampleOutputListOfFiles=[output_divided_path+ "/" + suffix + "_" + str(x) + ".sample" for x in range(1,22)+['X', 'Y']],
    270                 genOutputListOfFiles=[output_divided_path+ "/" + suffix + "_" + str(x) + ".gen" for x in range(1,22)+['X', 'Y']]
    271                 )
    272   }}}
    273 }}}
    274 
    275 ==== Source code ====
    276 http://www.bbmriwiki.nl/svn/Imputation/impute2/ConvertListsOfPedAndMapFilesToGenAndSample.py
    277 === UseImpute2WithOnePhasedReferencePanel ===
    278 Run impute2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) with one phased reference panel.
    279 
    280 ==== Parameters ====
    281  * impute2Path: Path of the impute2 tool (example: /Users/alexandroskanterakis/Tools/imputation/impute_v2.1.0_MacOSX_Intel/impute2)
    282  * mParameter: Fine-scale recombination map for the region to be analyzed. (example: Users/alexandroskanterakis/Data/HAPMAP_1000GP/hapmap3_r2_plus_1000g_jun2010_b36_ceu/genetic_map_chr1_combined_b36.txt )
    283  * hParameter: File of known haplotypes, with one row per SNP and one column per haplotype (example: /Users/alexandroskanterakis/Data/HAPMAP_1000GP/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.haps)
    284  * lParameter: Legend file(s) with information about the SNPs in the -h file(s) (example: /Users/alexandroskanterakis/Data/HAPMAP_1000GP/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.legend)
    285  * gParameter: File containing genotypes for a study cohort that we want to impute: (example: /Users/alexandroskanterakis/Data/Finnish_cohort/DividedChromosomes/Divided_1.gen)
    286  * startPos: Start position of the genomic interval to use for inference (example: 1)
    287  * endPos: End position of the genomic interval to use for inference (example: 5000000)
    288  * Ne: Effective size' of the population (commonly denoted as Ne in the population genetics literature) from which your dataset was sampled (example: 11418)
    289  * oParameter: Name of main output file. (example: /Users/alexandroskanterakis/Data/Finnish_cohort/DividedChromosomes/Divided_1.impute2)
    290 
    291 ==== Example ====
    292 {{{
    293 #!div style="font-size: 80%"
    294 Code highlighting:
    295   {{{#!python
    296 seImpute2WithOnePhasedReferencePanel(
    297         impute2Path="/Users/alexandroskanterakis/Tools/imputation/impute_v2.1.0_MacOSX_Intel/impute2",
    298         mParameter="/Users/alexandroskanterakis/Data/HAPMAP_1000GP/hapmap3_r2_plus_1000g_jun2010_b36_ceu/genetic_map_chr1_combined_b36.txt",
    299         hParameter="/Users/alexandroskanterakis/Data/HAPMAP_1000GP/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.haps",
    300         lParameter="/Users/alexandroskanterakis/Data/HAPMAP_1000GP/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.legend",
    301         gParameter="/Users/alexandroskanterakis/Data/Finnish_cohort/DividedChromosomes/Divided_1.gen",
    302         startPos=1,
    303         endPos=5000000,
    304         Ne=11418,
    305         oParameter="/Users/alexandroskanterakis/Data/Finnish_cohort/DividedChromosomes/Divided_1.impute2")
    306   }}}
    307 }}}
    308 
    309 ==== Source code ====
    310 http://www.bbmriwiki.nl/svn/Imputation/impute2/UseImpute2WithOnePhasedReferencePanel.py
    311 
    312 === UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches ===
    313 Run impute2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) with one phased reference panel. Divide the chromosome in batches.
    314 
    315 ==== Parameters ====
    316  * impute2Path: Path of impute2 (example: /Users/alex/Tools/impute2/impute_v2.1.2_MacOSX_Intel/impute2)
    317  * mParameter: Fine-scale recombination map for the region to be analyzed (-m parameter) (example: /Volumes/Data2/Impute/alex/hapmap3_r2_plus_1000g_jun2010_b36_ceu/genetic_map_chr1_combined_b36.txt)
    318  * hParameter: File of known haplotypes, with one row per SNP and one column per haplotype (-h parameter) (example: /Volumes/Data2/Impute/alex/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.haps)
    319  * lParameter: Legend file(s) with information about the SNPs in the -h file(s) (-l parameter) (example: /Volumes/Data2/Impute/alex/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.legend)
    320  * gParameter: File containing genotypes for a study cohort that we want to impute (-g parameter) (example: /Volumes/Data2/Impute/alex/gen_sample/chr1.gen)
    321  * sizeOfBatch: Size of each batch (example: 5000000)
    322  * Ne: Effective size of the population (commonly denoted as Ne in the population genetics literature) from which your dataset was sampled (example: 11418)
    323  * oParameter: Name of main output file (-o parameter) (example: /Volumes/Data2/Impute/alex/gen_sample/)
    324  * suffix: suffix for output fil (example: chr_1_)
    325  * indexOfChromosome: Chromosome (1-22, X, Y, M) (example: 1)
    326 
    327 ==== Example ====
    328 {{{
    329 #!div style="font-size: 80%"
    330 Code highlighting:
    331   {{{#!python
    332 UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches(
    333         impute2Path = "/Users/alex/Tools/impute2/impute_v2.1.2_MacOSX_Intel/impute2" ,
    334         mParameter = "/Volumes/Data2/Impute/alex/hapmap3_r2_plus_1000g_jun2010_b36_ceu/genetic_map_chr1_combined_b36.txt" ,
    335         hParameter = "/Volumes/Data2/Impute/alex/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.haps" ,
    336         lParameter = "/Volumes/Data2/Impute/alex/hapmap3_r2_plus_1000g_jun2010_b36_ceu/hapmap3.r2.b36.allMinusPilot1CEU.chr1.snpfilt.legend" ,
    337         gParameter = "/Volumes/Data2/Impute/alex/gen_sample/chr1.gen" ,
    338         sizeOfBatch = 5000000 ,
    339         Ne = 11418 ,
    340         oParameter = "/Volumes/Data2/Impute/alex/gen_sample/" ,
    341         suffix = "chr_1_" ,
    342         indexOfChromosome = "1")
    343   }}}
    344 }}}
    345 
    346 ==== Source Code ====
    347 http://www.bbmriwiki.nl/svn/Imputation/impute2/UseImpute2WithOnePhasedReferencePanelForCompleteChromosomeInBatches.py
    348 == BEABLE pipeline ==
    349 
    350 TODO: paste shell script descriptions of each step
    351 
    352 == Discussion ==
    353 
    354 === Mixing platforms may influence imputation results ===
    355 
    356 We into some troubles, resulting in our test statistic being highly inflated (which is indicative of false positive results). We thought of some possible causes which might explain this effect, although we should still test them:
    357  * '''SNPs with bad imputation quality''': we should remove SNPs with an R2 value < 0.90 prior to GWAS. These values are stored alongside the beagle imputation output. Taking a more stringent cutoff seemed to decrease the inflation, although you lose half of the SNPs.
    358  *  '''batch effects caused by overrepresentation of a certain haplotype within an imputation batch''': for each batch of samples, beagle estimates a best fitting model to predict the genotypes of the missing SNPs, which is dependent upon both the input data as the reference dataset. Cases and controls should be therefore randomly distributed across the batches. Another option is to use impute, rather than beagle, since its batches are across parts of the genome, instead of samples.
    359  * '''difference in source platform''': different platforms have different SNP content. When you impute datasets coming from different platforms, the resulting model which is based on the input data is also different. When associating traits in a GWAS meta-analysis, these differences may account for a platform specific effect. We should therefore remove the SNPs which are non-overlapping between such platforms, prior to imputation, and impute the samples after combining the datasets. This would remove such a platform-bias, although would also cause a huge loss of available SNPs, when the overlap between platforms is small. However, in my opinion, this problem is similar to the batch effect problem, and can possibly be resolved by randomizing the sample content of the batches: the model will then possibly be fitted to the data that is available. In any case the datasets that are used in a meta-analysis should be imputed together.
     1test