Workflow 3: sample level variant calling
Table of Contents
This workflow will call variants for the samples including:
- sample level recalibration
- sample level realignment
N.B. no sample level MarkDuplicates? is needed as lanes = libraries.
Workflow inputs:
- lane.chr.recal.sorted.bam - for all sample lanes: dedupped, recalibrated, realigned, sorted and indexed bams (3)
- sample.chip.vcf - genotypes called from genotype chip
Reference:
- genome.chr.fasta - reference genome split on chromosome
- genome.chr.realign.intervals - targets for realignment per chromosome
- genome.chr.dbsnpXYZ.rod - known snp variants, here from dpbsnp
- genome.chr.indelsXYZ.vcf - known indels from, here from 1KG
Workflow outputs:
- sample.chr.bam - merged bam files per sample
- sample.chr.realign.interval - realignment target intervals
- sample.chr.realigned.bam - realigned
- sample.chr.matesfixed.bam - fixed pairs in realignment
- sample.chr.indels.vcf - raw indels called
- sample.chr.indels.bed - raw indels annotations
- sample.chr.indels.txt - output from the indel calling
- sample.chr.indels.filtered.bed - indels filtered
- sample.chr.snps.vcf - raw snps called
- sample.chr.snps.filtered.vcf - snps filtered
merge-lanes
Merge lanes into one sample bam
tool: | sam merge |
inputs: | lane.chr.recal.sorted.bam |
outputs: | sample.chr.bam |
docs: | http://samtools.sourceforge.net/samtools.shtml |
RealignerTargetCreator
Create realignment targets based on the data (so not only knowns)
tool: | GenomeAnalysisTK.jar -T RealignerTargetCreator? |
inputs: | sample.chr.bam genome.chr.fa dbsnpXYz.chr.rod indelsXYZ.vcf |
outputs: | sample.chr.realign.intervals |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Creating_Intervals |
IndelRealigner
Realign based on realignment targets in previous step
tool: | GenomeAnalysisTK.jar -T IndelRealigner? |
inputs: | sample.chr.bam genome.chr.realign.intervals genome.chr.dbsnpXYZ.rod genome.chr.indelsXYZ.vcf |
outputs: | sample.chr.realigned.bam |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Realigning |
FixMateInformation
See description in workflow2, now applied to sample
inputs: | sample.chr.realigned.bam |
ouputs: | sample.chr.matesfixed.bam |
IndelGenotyperV2
Call indels
tool: | GenomeAnalysisTK.jar -T IndelGenotyperV2 |
inputs: | sample.chr.matesfixed.bam genome.chr.fa |
outputs: | sample.chr.indels.vcf sample.chr.indels.bed sample.chr.indels.txt |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0 |
filterSingleSampleCalls
Filter indels
tool: | filterSingleSampleCalls.pl |
inputs: | sample.chr.indels.bed |
outputs: | sample.chr.indels.filtered.bed |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SampleIndelGenotyper |
UnifiedGenotyper
Call SNPs
tool: | GenomeAnalysisTK.jar -T UnifiedGenotyper? |
inputs: | sample.chr.matesfixed genome.chr.fa dbsnpXYz.chr.rod |
outputs: | sample.chr.snps.vcf |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SetUnifiedGenotypertoEval |
makeIndelMask
Make indel mask
tool: | makeIndelMask.py |
inputs: | sample.chr.indels.bed |
outputs: | sample.chr.indels.mask.bed |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0#Creating_a_indel_mask_file |
VariantFiltration
Filter variants to get the best calls possible
tool: | GenomeAnalysisTK.jar -T VariantFiltration? |
inputs: | sample.chr.snps.vcf genome.chr.fa dbsnpXYz.chr.rod |
outputs: | sample.chr.snps.filtered.vcf |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v2#Integrating_analyses:_getting_the_best_call_set_possible |
MergeVcfs
ChipVcf
Produce vcf for the chips
VariantEval
Create summary information on the variations called for evaluation. Run per sample.snps.filtered.vcf against chip.
tool: | GenomeAnalysisTK.jar -T VariantEval? |
inputs: | sample.snps.vcf sample.chip.vcf genome.chr.fa dbsnpXYz.chr.rod |
outputs: | sample.snps.eval |
doc: | http://www.broadinstitute.org/gsa/wiki/index.php/VariantEval |
Discussion:
Do we call SNPs based on the filtered indels or the raw indels? Should we realign AGAIN after merge of lanes? BAQ? MINDEL/PINDEL?
Dindel
Dindel has finally been released.
The article can be found here: http://www.ncbi.nlm.nih.gov/pubmed/20980555 The tool can be downloaded from: http://www.sanger.ac.uk/resources/software/dindel/
Questions:
- Are we going to implement this tool also, besides Pindel?
- If so, were (which step) is this tool implemented?