= Workflow 3: sample level variant calling = [[TOC()]] This workflow will call variants for the samples including: * sample level recalibration * sample level realignment N.B. no sample level MarkDuplicates is needed as lanes = libraries. Workflow inputs: * lane.chr.recal.sorted.bam - for all sample lanes: dedupped, recalibrated, realigned, sorted and indexed bams (3) * sample.chip.vcf - genotypes called from genotype chip Reference: * genome.chr.fasta - reference genome split on chromosome * genome.chr.realign.intervals - targets for realignment per chromosome * genome.chr.dbsnpXYZ.rod - known snp variants, here from dpbsnp * genome.chr.indelsXYZ.vcf - known indels from, here from 1KG Workflow outputs: * sample.chr.bam - merged bam files per sample * sample.chr.realign.interval - realignment target intervals * sample.chr.realigned.bam - realigned * sample.chr.matesfixed.bam - fixed pairs in realignment * sample.chr.indels.vcf - raw indels called * sample.chr.indels.bed - raw indels annotations * sample.chr.indels.txt - output from the indel calling * sample.chr.indels.filtered.bed - indels filtered * sample.chr.snps.vcf - raw snps called * sample.chr.snps.filtered.vcf - snps filtered == merge-lanes == Merge lanes into one sample bam ||tool: ||sam merge || ||inputs: ||lane.chr.recal.sorted.bam || ||outputs: ||sample.chr.bam || ||docs: ||http://samtools.sourceforge.net/samtools.shtml || == !RealignerTargetCreator == Create realignment targets based on the data (so not only knowns) ||tool: ||GenomeAnalysisTK.jar -T RealignerTargetCreator || ||inputs: ||sample.chr.bam [[BR]]genome.chr.fa [[BR]]dbsnpXYz.chr.rod [[BR]]indelsXYZ.vcf ||outputs: ||sample.chr.realign.intervals || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Creating_Intervals || == !IndelRealigner == Realign based on realignment targets in previous step ||tool: ||GenomeAnalysisTK.jar -T IndelRealigner || ||inputs: ||sample.chr.bam [[BR]]genome.chr.realign.intervals [[BR]] genome.chr.dbsnpXYZ.rod [[BR]] genome.chr.indelsXYZ.vcf || ||outputs: ||sample.chr.realigned.bam || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels#Realigning || == !FixMateInformation == See description in workflow2, now applied to sample ||inputs: ||sample.chr.realigned.bam || ||ouputs: ||sample.chr.matesfixed.bam || == IndelGenotyperV2 == Call indels ||tool: ||GenomeAnalysisTK.jar -T IndelGenotyperV2 || ||inputs: ||sample.chr.matesfixed.bam [[BR]]genome.chr.fa || ||outputs: ||sample.chr.indels.vcf [[BR]]sample.chr.indels.bed [[BR]]sample.chr.indels.txt || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0 [[BR]] http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SampleIndelGenotyper || == filterSingleSampleCalls == Filter indels ||tool: ||filterSingleSampleCalls.pl || ||inputs: ||sample.chr.indels.bed || ||outputs: ||sample.chr.indels.filtered.bed || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SampleIndelGenotyper || == !UnifiedGenotyper == Call SNPs ||tool: ||GenomeAnalysisTK.jar -T UnifiedGenotyper || ||inputs: ||sample.chr.matesfixed [[BR]]genome.chr.fa [[BR]]dbsnpXYz.chr.rod || ||outputs: ||sample.chr.snps.vcf || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/Firehose_Parameters#SetUnifiedGenotypertoEval [[BR]] http://www.broadinstitute.org/gsa/wiki/index.php/Unified_genotyper || == makeIndelMask == Make indel mask ||tool: ||makeIndelMask.py || ||inputs: ||sample.chr.indels.bed || ||outputs: ||sample.chr.indels.mask.bed || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/Indel_Genotyper_V2.0#Creating_a_indel_mask_file || == !VariantFiltration == Filter variants to get the best calls possible ||tool: ||GenomeAnalysisTK.jar -T VariantFiltration || ||inputs: ||sample.chr.snps.vcf [[BR]]genome.chr.fa [[BR]]dbsnpXYz.chr.rod || ||outputs: ||sample.chr.snps.filtered.vcf || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v2#Integrating_analyses:_getting_the_best_call_set_possible || == !MergeVcfs == == !ChipVcf == Produce vcf for the chips == !VariantEval == Create summary information on the variations called for evaluation. Run per sample.snps.filtered.vcf against chip. ||tool: ||GenomeAnalysisTK.jar -T VariantEval || ||inputs: ||sample.snps.vcf [[BR]]sample.chip.vcf [[BR]]genome.chr.fa [[BR]]dbsnpXYz.chr.rod|| ||outputs: ||sample.snps.eval || ||doc: ||http://www.broadinstitute.org/gsa/wiki/index.php/VariantEval || Discussion: > Do we call SNPs based on the filtered indels or the raw indels? > Should we realign AGAIN after merge of lanes? > BAQ? > MINDEL/PINDEL? == Dindel == Dindel has finally been released. The article can be found here: http://www.ncbi.nlm.nih.gov/pubmed/20980555 The tool can be downloaded from: http://www.sanger.ac.uk/resources/software/dindel/ Questions: * Are we going to implement this tool also, besides Pindel? * If so, were (which step) is this tool implemented?