Version 27 (modified by 14 years ago) (diff) | ,
---|
Introduction
This is a SNP annotation pipeline.
PrepareGFFFilesFromBGIForSeattleSeqAnnotation
Preprocesses GFF files coming from the BGI institute for SeattleAnnotationTool?. Replace alleles
with allele
and adds the line: # autoFile testAuto.txt
in the top of the file.
Parameters
- GFFFilename : Input filename
- outputGFFFilename: Output filename
Example
Code highlighting:
PrepareGFFFilesFromBGIForSeattleSeqAnnotation("/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff", "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff")
Source Code
AnnotateVarianListFileViaSeattleSeqAnnotation
Annotate Files with Variants through Seattle Seq Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ . The java code that wraps the forms is provided from SeattleSeq? Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java . This method wraps the wrapper(..) and provides a python implementation. In order to run there should be a directory under the current path, named "jars" with the following jar files:
- httpunit.jar
- js-1.6R5.jar
- junit-3.8.1.jar
- nekohtml-0.9.5.jar
- xercesImpl-2.6.1.jar
Parameters
For a complete list of parameters please check the Annotation website and the example below
Example
Code highlighting:
AnnotateVarianListFileViaSeattleSeqAnnotation( inputFile=/Users/alexandroskanterakis/Data/SNP/chr1.snp.Q20.gff, outputFile=/Users/alexandroskanterakis/Tools/annotation/seattleseqannotation/output.txt, eMail=alexandros.kanterakis@gmail.com, fileFormat=GFF, geneData=CCDS2008, allelesMaq=true, allelesDBSNP=true, scorePhastCons=true, scorePhastCons=true, consScoreGERP=true, chimpAllele=true, CNV=true, geneList=true, HapMapFreqType=HapMapFreqMinor, geneList=true, hasGenotypes=true, dbSNPValidation=true, repeats=true, geneList=true, proteinSequence=true, polyPhen=true, clinicalAssociation=true )
Source Code
AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs
This method takes a list of files that have been generated from SeattleSeq Annotation tool and a list of tabular files that contain Chromosome and position columns. It adds the polyphen annotation that is contained in the former list of files to the later.
Parameters
- listOfSeattleSeqAnnotationOutputs: list of SeattleSeq? Annotation files that we want to take the polyphen annotation from
- listOfFileToBeAnnotated: List of files with chromosome and position information.
- chromosomeColumn: The Chromosome column of the files to be annotated
- positionColumn: The position column of the files to be annotated
- outputDir: The directory where the generated files will be stored
- outputSuffix: The suffix of the output files.
Example
Code highlighting:
listOfSeattleSeqAnnotationOutputs = [ "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/000074.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/000159.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/000363.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/030042.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/030101.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/960313.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/960318.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0316-04.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0316-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0322-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0322-08.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0326-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0326-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0360-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0360-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0360-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0376-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0376-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0398-011.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD0398-012.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD2018-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD2018-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5000-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5059-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5063-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5065-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5066-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5067-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5084-007.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5096-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5116-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5166-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5174-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5176-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5217-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5252-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5257-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt", "/Users/alexandroskanterakis/Data/CD_china/CD5258-002.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt" ] filesToBeAnnotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs( listOfSeattleSeqAnnotationOutputs=listOfSeattleSeqAnnotationOutputs, listOfFileToBeAnnotated=filesToBeAnnotated, chromosomeColumn=2, positionColumn=3, outputDir="/Users/alexandroskanterakis/Data/CD_china/Intersection", outputSuffix="_poluphenExample.txt", numberOfFirstLinesToIgnore=1 )
Source code
AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl
Create Gene Ontology (http://www.geneontology.org/) annotation for a list of files that contain at least a position column and a chromosome column.
Parameters
- listOfFilesToAnnotate: Python list of filenames to be annotated
- numberOfFirstLinesToIgnoreInFileToAnnotate:
- chromosomeColumnOfFilesToAnnotate: The # of the chromosome column in the file to be annotated (starting from 0)
- positionColumnOfFilesToAnnotate: The # of the position column in the file to be annotated (starting from 0)
- resolveDuplicateValuesFunctionInFileToBeAnnotated: What should we do if we found 2 lines in the file to be annotated that has the same position and chromosome? If not set to None it will call the function assigned to this parameter
- fileWithGOAnnotation: The file that has been downloaded from BioMart? and contains the GO annotation.
- fileWithGOAnnotationChromosomeColumn: The column that contain the chromosome in the fileWithGOAnnotation
- fileWithGOAnnotationStartColumn: The column that contain the start of the transcript in the fileWithGOAnnotation
- fileWithGOAnnotationEndColumn: The columns that contain the end of the transcript in the fileWithGOAnnotation
- columnsWithGOAnnotationComaSeparated: The columns that contain the annotations that we want to add in the fileWithGOAnnotation. Example: "2,3,4"
- numberOfFirstLinesToIgnoreInGOAnnotationFile
- outputDirectory
- outputSuffix: The output file will be: outputDirectory/(basename of inputFile)+outputSuffix
Example
Code highlighting:
fileList= [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl( listOfFilesToAnnotate=fileList, numberOfFirstLinesToIgnoreInFileToAnnotate=1, chromosomeColumnOfFilesToAnnotate=2, positionColumnOfFilesToAnnotate=3, fileWithGOAnnotation="/Users/alexandroskanterakis/Data/Ensembl/GENE_START_END_GO_FROM_ENSEMBL_36.txt", fileWithGOAnnotationChromosomeColumn=1, fileWithGOAnnotationStartColumn=2, fileWithGOAnnotationEndColumn=3, columnsWithGOAnnotationComaSeparated="4,5,6,7,8,9", numberOfFirstLinesToIgnoreInGOAnnotationFile=1, outputDirectory="/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02", outputSuffix="_GO.txt" )
Source Code
CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames
Creates Allele Frequency annotation from a list of VCFFilenames for tabular files that contain at least a chromosome and a position column. It requires the xapian (http://xapian.org) python package and vcftools (http://vcftools.sourceforge.net/)
Parameters
- pathToVCFTools: Path where vcftools is installed
- listOfVCFFiles: python list of VCF files where the annotation will come from
- listOfFilenamesToBeAnnotated
- outputPreffix
- xapianIndexDirectory. If None it will create a system temporary directory.
Example
Code highlighting:
import wikipl from wikipl import CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames VCFFilenames_Example = [ "/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/CEU.exon.2010_03.genotypes.vcf", "/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/YRI.exon.2010_03.genotypes.vcf" ] filesToBeAnnotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames( pathToVCFTools = "/Users/alexandroskanterakis/Tools/vcftools/cpp/vcftools", listOfVCFFiles=VCFFilenames_Example, listOfFilenamesToBeAnnotated=filesToBeAnnotated, outputPreffix="_AlleleFrequencyExample.txt", xapianIndexDirectory="/Users/alexandroskanterakis/Data/CD_china/Intersection/xapianDB_Example" )
Source code
MergeHorizontallyFilesAccordingToCommonColumns?
Merge horizontally files according to common columns
Parameters
- listOfFilenamesToBeAnnotated: Python list of filenames to be annotated.
- listOfColumnsFromFileToBeAnnotated: Python list of columns that we want to keep from the files to be annotated
- listOfListsOfInputFilenames: Python list of python list of input filenames
- listOfAnnotationFileColumns
- listOfFirstLinesToIgnore: Python list of first lines to ignore from each annotation file
- listOfOutputFilenames
Example
Code highlighting:
filesToBeAnnotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt" ] filesAnnotation1 = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_polyphen.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_polyphen.txt" ] filesAnnotation2 = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_GO.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_GO.txt" ] filesAnnotation3 = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_AlleleFrequency.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_AlleleFrequency.txt" ] filesOutput123Annotated = [ "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_Annotated.txt", "/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_Annotated.txt" ] MergeHorizontallyFilesAccordingToCommonColumns( listOfFilenamesToBeAnnotated=filesToBeAnnotated, # listOfColumnsFromFileToBeAnnotated=range(39), listOfColumnsFromFileToBeAnnotated = [2,3], listOfListsOfInputFilenames=[filesAnnotation1,filesAnnotation2,filesAnnotation3], listOfAnnotationFileColumns=[[2],[2,3,4,5,6,7],[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]], listOfOutputFilenames=filesOutput123Annotated )
Source Code
ANNOVAR Annotation pipeline
- About ANNOVAR: http://www.openbioinformatics.org/annovar/
- Download ANNOVAR http://www.openbioinformatics.org/annovar/download/annovar.latest.tar.gz more information: http://www.openbioinformatics.org/annovar/annovar_download.html
- ANNOVAR is already installed and configured in gbicdev: /data/home/data/alex/ANNOVAR/
- Download (for both hg18 and hg18 release) annotation files (most of them from UCSC):
- Gene based: http://www.openbioinformatics.org/annovar/annovar_gene.html
- Region based: http://www.openbioinformatics.org/annovar/annovar_region.html
- Filter based: http://www.openbioinformatics.org/annovar/annovar_filter.html
- These files are have already been downloaded in gbicdev: /data/home/data/alex/ANNOVAR/humandb_hg18/ , /data/home/data/alex/ANNOVAR/humandb_hg19/
- To Annotate a VCF (version 4.0) file, use: http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/ANNOVAR.py This is a wrapper for the ANNOVAR tool.
- Usage:
- Example: python ANNOVAR.py --pathToANNOVAR /data/home/akanterakis/tools/ANNOVAR/annovar --VCFFilename /data/home/data/pdeelen/Celiac40ExomsProject/SequenceData/sequence0605_41.index_hg18.snps.filtered.vcf --outputFilename /data/home/data/alex/ANNOVAR/annotated/output.txt --outputDirectory /data/home/data/alex/ANNOVAR/annotated/ --buildver hg18 --annotationDirectory /data/home/data/alex/ANNOVAR/humandb_hg18/ --geneBasedAnnotations refgene,knowngene,ensgene --regionBasedAnnotations band,segdup,dgv,gwascatalog --filterBasedAnnotations snp130 --customAnnotations kantale
- Options:
- --pathToANNOVAR: The path to the installed ANNOVAR tool
- --VCFFilename: The path to the VCF file to be annotated
- --outputFilename: The output annotated file
- --outputDirectory: output directory
- --buildver: Could be either hg18 or hg18
- --annotationDirectory: The directory where the annotation files are
- --geneBasedAnnotations: Gene Based Annotations according to ANNOVAR (coma separated no spaces)
- --regionBasedAnnotations: Region Based Annotations according to ANNOVAR (coma separated no spaces)
- --filterBasedAnnotations: Filter Based Annotations according to ANNOVAR (coma separated no spaces)
- --customAnnotations: Custom annotations. These should be GFF3 files http://www.sequenceontology.org/gff3.shtml. The list of files should be comma separated no spaces
- --dummy: If set True, the script will not do anything just print the script commands. Default value: False
- --verbose True/False? (Default: True)
- There are additional options for GeneOntology? Annoatation and AlleleFrequency? from VCF files under development.
- Extra annotation files have been generated in GFF3 format to be used with the customAnnotation parameter:
- /data/home/data/alex/ANNOVAR/annotated/sequence0605_41.index_hg18.snps.filtered.vcf.sequence0605_41.index_hg18.snps.filtered.vcf.AlleleFrequency_AlleleFrequecy.hg18_gff3
- /data/home/data/alex/ANNOVAR/annotated/sequence0605_41.index_hg18.snps.filtered.vcf.sequence0605_41.index_hg18.snps.filtered.vcf.GO_GeneOntology.hg18_gff3
Related work
- http://www.svaproject.org/
- http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1001074
Links and resources
Attachments (2)
-
AnnotateIntersectionOverview.zip (760.6 KB) - added by 14 years ago.
Netbeans project for Annotation tool
-
lib.zip (693.7 KB) - added by 14 years ago.
lib for AnnotateIntersectionOverview?
Download all attachments as: .zip