wiki:SnpCallingPipeline/VariantCalling

Version 2 (modified by freerkvandijk, 14 years ago) (diff)

--

Workflow 3: sample level variant calling

This workflow will call variants for the samples including:

  • sample level recalibration
  • sample level realignment

N.B. no sample level MarkDuplicates? is needed as lanes = libraries.

Workflow inputs:

  • lane.chr.recal.sorted.bam - for all sample lanes: dedupped, recalibrated, realigned, sorted and indexed bams (3)
  • sample.chip.vcf - genotypes called from genotype chip

Reference:

  • genome.chr.fasta - reference genome split on chromosome
  • genome.chr.realign.intervals - targets for realignment per chromosome
  • genome.chr.dbsnpXYZ.rod - known snp variants, here from dpbsnp
  • genome.chr.indelsXYZ.vcf - known indels from, here from 1KG

Workflow outputs:

  • sample.chr.bam - merged bam files per sample
  • sample.chr.realign.interval - realignment target intervals
  • sample.chr.realigned.bam - realigned
  • sample.chr.matesfixed.bam - fixed pairs in realignment
  • sample.chr.indels.vcf - raw indels called
  • sample.chr.indels.bed - raw indels annotations
  • sample.chr.indels.txt - output from the indel calling
  • sample.chr.indels.filtered.bed - indels filtered
  • sample.chr.snps.vcf - raw snps called
  • sample.chr.snps.filtered.vcf - snps filtered

merge-lanes

Merge lanes into one sample bam

tool: sam merge
inputs: lane.chr.recal.sorted.bam
outputs: sample.chr.bam
docs: http://samtools.sourceforge.net/samtools.shtml

RealignerTargetCreator

Create realignment targets based on the data (so not only knowns)

tool: GenomeAnalysisTK.jar -T RealignerTargetCreator?
inputs: sample.chr.bam
genome.chr.fa
dbsnpXYz.chr.rod
indelsXYZ.vcf
outputs: sample.chr.realign.intervals
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Creating_Intervals

IndelRealigner

Realign based on realignment targets in previous step

tool: GenomeAnalysisTK.jar -T IndelRealigner?
inputs: sample.chr.bam
genome.chr.realign.intervals
genome.chr.dbsnpXYZ.rod
genome.chr.indelsXYZ.vcf
outputs: sample.chr.realigned.bam
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Realigning

FixMateInformation

See description in workflow2, now applied to sample

inputs: sample.chr.realigned.bam
ouputs: sample.chr.matesfixed.bam

IndelGenotyperV2

Call indels

tool: GenomeAnalysisTK.jar -T IndelGenotyperV2
inputs: sample.chr.matesfixed.bam
genome.chr.fa
outputs: sample.chr.indels.vcf
sample.chr.indels.bed
sample.chr.indels.txt
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0
http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SampleIndelGenotyper

filterSingleSampleCalls

Filter indels

tool: filterSingleSampleCalls.pl
inputs: sample.chr.indels.bed
outputs: sample.chr.indels.filtered.bed
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SampleIndelGenotyper

UnifiedGenotyper

Call SNPs

tool: GenomeAnalysisTK.jar -T UnifiedGenotyper?
inputs: sample.chr.matesfixed
genome.chr.fa
dbsnpXYz.chr.rod
outputs: sample.chr.snps.vcf
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SetUnifiedGenotypertoEval
http://www.broadinstitute.org/gsa/wiki/index.php/Unified_genotyper

makeIndelMask

Make indel mask

tool: makeIndelMask.py
inputs: sample.chr.indels.bed
outputs: sample.chr.indels.mask.bed
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0#Creating_a_indel_mask_file

VariantFiltration

Filter variants to get the best calls possible

tool: GenomeAnalysisTK.jar -T VariantFiltration?
inputs: sample.chr.snps.vcf
genome.chr.fa
dbsnpXYz.chr.rod
outputs: sample.chr.snps.filtered.vcf
doc: http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v2#Integrating_analyses:_getting_the_best_call_set_possible

MergeVcfs

ChipVcf

Produce vcf for the chips

VariantEval

Create summary information on the variations called for evaluation. Run per sample.snps.filtered.vcf against chip.

tool: GenomeAnalysisTK.jar -T VariantEval?
inputs: sample.snps.vcf
sample.chip.vcf
genome.chr.fa
dbsnpXYz.chr.rod
outputs: sample.snps.eval
doc: http://www.broadinstitute.org/gsa/wiki/index.php/VariantEval

Discussion:

Do we call SNPs based on the filtered indels or the raw indels? Should we realign AGAIN after merge of lanes? BAQ? MINDEL/PINDEL?

Dindel

Dindel has finally been released.

The article can be found here: http://www.ncbi.nlm.nih.gov/pubmed/20980555 The tool can be downloaded from: http://www.sanger.ac.uk/resources/software/dindel/

Questions:

  • Are we going to implement this tool also, besides Pindel?
  • If so, were is this tool implemented?