| Version 25 (modified by , 15 years ago) (diff) |
|---|
Introduction
This is a SNP annotation pipeline.
PrepareGFFFilesFromBGIForSeattleSeqAnnotation
Preprocesses GFF files coming from the BGI institute for SeattleAnnotationTool?. Replace alleles with allele and adds the line: # autoFile testAuto.txt in the top of the file.
Parameters
- GFFFilename : Input filename
- outputGFFFilename: Output filename
Example
Code highlighting:
PrepareGFFFilesFromBGIForSeattleSeqAnnotation("/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff", "/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.gff")
Source Code
AnnotateVarianListFileViaSeattleSeqAnnotation
Annotate Files with Variants through Seattle Seq Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ . The java code that wraps the forms is provided from SeattleSeq? Annotation: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java . This method wraps the wrapper(..) and provides a python implementation. In order to run there should be a directory under the current path, named "jars" with the following jar files:
- httpunit.jar
- js-1.6R5.jar
- junit-3.8.1.jar
- nekohtml-0.9.5.jar
- xercesImpl-2.6.1.jar
Parameters
For a complete list of parameters please check the Annotation website and the example below
Example
Code highlighting:
AnnotateVarianListFileViaSeattleSeqAnnotation(
inputFile=/Users/alexandroskanterakis/Data/SNP/chr1.snp.Q20.gff,
outputFile=/Users/alexandroskanterakis/Tools/annotation/seattleseqannotation/output.txt,
eMail=alexandros.kanterakis@gmail.com,
fileFormat=GFF,
geneData=CCDS2008,
allelesMaq=true,
allelesDBSNP=true,
scorePhastCons=true,
scorePhastCons=true,
consScoreGERP=true,
chimpAllele=true,
CNV=true,
geneList=true,
HapMapFreqType=HapMapFreqMinor,
geneList=true,
hasGenotypes=true,
dbSNPValidation=true,
repeats=true,
geneList=true,
proteinSequence=true,
polyPhen=true,
clinicalAssociation=true
)
Source Code
AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs
This method takes a list of files that have been generated from SeattleSeq Annotation tool and a list of tabular files that contain Chromosome and position columns. It adds the polyphen annotation that is contained in the former list of files to the later.
Parameters
- listOfSeattleSeqAnnotationOutputs: list of SeattleSeq? Annotation files that we want to take the polyphen annotation from
- listOfFileToBeAnnotated: List of files with chromosome and position information.
- chromosomeColumn: The Chromosome column of the files to be annotated
- positionColumn: The position column of the files to be annotated
- outputDir: The directory where the generated files will be stored
- outputSuffix: The suffix of the output files.
Example
Code highlighting:
listOfSeattleSeqAnnotationOutputs = [
"/Users/alexandroskanterakis/Data/CD_china/000057.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000074.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000159.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/000363.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030042.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/030101.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960313.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/960318.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-04.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0316-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0322-08.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0326-07.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0360-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-02.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0376-05.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-011.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD0398-012.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-03.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD2018-06.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5000-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5059-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5063-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5065-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5066-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5067-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5084-007.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5096-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5116-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5166-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5174-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5176-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5217-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5252-001.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5257-005.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt",
"/Users/alexandroskanterakis/Data/CD_china/CD5258-002.snp.Q20.alleleCorrection.autoFile.SeattleOutput.txt"
]
filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]
AddPolyphenAnnotationToSNPsFromSeattleSeqAnnotationOutputs(
listOfSeattleSeqAnnotationOutputs=listOfSeattleSeqAnnotationOutputs,
listOfFileToBeAnnotated=filesToBeAnnotated,
chromosomeColumn=2,
positionColumn=3,
outputDir="/Users/alexandroskanterakis/Data/CD_china/Intersection",
outputSuffix="_poluphenExample.txt",
numberOfFirstLinesToIgnore=1
)
Source code
AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl
Create Gene Ontology (http://www.geneontology.org/) annotation for a list of files that contain at least a position column and a chromosome column.
Parameters
- listOfFilesToAnnotate: Python list of filenames to be annotated
- numberOfFirstLinesToIgnoreInFileToAnnotate:
- chromosomeColumnOfFilesToAnnotate: The # of the chromosome column in the file to be annotated (starting from 0)
- positionColumnOfFilesToAnnotate: The # of the position column in the file to be annotated (starting from 0)
- resolveDuplicateValuesFunctionInFileToBeAnnotated: What should we do if we found 2 lines in the file to be annotated that has the same position and chromosome? If not set to None it will call the function assigned to this parameter
- fileWithGOAnnotation: The file that has been downloaded from BioMart? and contains the GO annotation.
- fileWithGOAnnotationChromosomeColumn: The column that contain the chromosome in the fileWithGOAnnotation
- fileWithGOAnnotationStartColumn: The column that contain the start of the transcript in the fileWithGOAnnotation
- fileWithGOAnnotationEndColumn: The columns that contain the end of the transcript in the fileWithGOAnnotation
- columnsWithGOAnnotationComaSeparated: The columns that contain the annotations that we want to add in the fileWithGOAnnotation. Example: "2,3,4"
- numberOfFirstLinesToIgnoreInGOAnnotationFile
- outputDirectory
- outputSuffix: The output file will be: outputDirectory/(basename of inputFile)+outputSuffix
Example
Code highlighting:
fileList= [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]
AnnotateListOfChromosomePositionFilesWithGOFromBioMartEnsembl(
listOfFilesToAnnotate=fileList,
numberOfFirstLinesToIgnoreInFileToAnnotate=1,
chromosomeColumnOfFilesToAnnotate=2,
positionColumnOfFilesToAnnotate=3,
fileWithGOAnnotation="/Users/alexandroskanterakis/Data/Ensembl/GENE_START_END_GO_FROM_ENSEMBL_36.txt",
fileWithGOAnnotationChromosomeColumn=1,
fileWithGOAnnotationStartColumn=2,
fileWithGOAnnotationEndColumn=3,
columnsWithGOAnnotationComaSeparated="4,5,6,7,8,9",
numberOfFirstLinesToIgnoreInGOAnnotationFile=1,
outputDirectory="/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02",
outputSuffix="_GO.txt"
)
Source Code
CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames
Creates Allele Frequency annotation from a list of VCFFilenames for tabular files that contain at least a chromosome and a position column. It requires the xapian (http://xapian.org) python package and vcftools (http://vcftools.sourceforge.net/)
Parameters
- pathToVCFTools: Path where vcftools is installed
- listOfVCFFiles: python list of VCF files where the annotation will come from
- listOfFilenamesToBeAnnotated
- outputPreffix
- xapianIndexDirectory. If None it will create a system temporary directory.
Example
Code highlighting:
import wikipl
from wikipl import CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames
VCFFilenames_Example = [
"/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/CEU.exon.2010_03.genotypes.vcf",
"/Users/alexandroskanterakis/Data/1000GP/vol1.ftp.pilot_data.release.2010_07.exon.snps/YRI.exon.2010_03.genotypes.vcf"
]
filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]
CreateAlleleFrequencyAnnotationFilesForTabularFilenamesFromVCFFilenames(
pathToVCFTools = "/Users/alexandroskanterakis/Tools/vcftools/cpp/vcftools",
listOfVCFFiles=VCFFilenames_Example,
listOfFilenamesToBeAnnotated=filesToBeAnnotated,
outputPreffix="_AlleleFrequencyExample.txt",
xapianIndexDirectory="/Users/alexandroskanterakis/Data/CD_china/Intersection/xapianDB_Example"
)
Source code
MergeHorizontallyFilesAccordingToCommonColumns?
Merge horizontally files according to common columns
Parameters
- listOfFilenamesToBeAnnotated: Python list of filenames to be annotated.
- listOfColumnsFromFileToBeAnnotated: Python list of columns that we want to keep from the files to be annotated
- listOfListsOfInputFilenames: Python list of python list of input filenames
- listOfAnnotationFileColumns
- listOfFirstLinesToIgnore: Python list of first lines to ignore from each annotation file
- listOfOutputFilenames
Example
Code highlighting:
filesToBeAnnotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22.txt"
]
filesAnnotation1 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_polyphen.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_polyphen.txt"
]
filesAnnotation2 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_GO.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_GO.txt"
]
filesAnnotation3 = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_AlleleFrequency.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_AlleleFrequency.txt"
]
filesOutput123Annotated = [
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_1_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_2_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_3_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_4_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_5_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_6_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_7_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_8_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_9_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_10_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_11_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_12_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_13_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_14_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_15_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_16_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_17_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_18_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_19_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_20_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_21_Annotated.txt",
"/Users/alexandroskanterakis/Data/CD_china/genomeWideExcluding/genomeWideExcluding360-02/tab_22_Annotated.txt"
]
MergeHorizontallyFilesAccordingToCommonColumns(
listOfFilenamesToBeAnnotated=filesToBeAnnotated,
# listOfColumnsFromFileToBeAnnotated=range(39),
listOfColumnsFromFileToBeAnnotated = [2,3],
listOfListsOfInputFilenames=[filesAnnotation1,filesAnnotation2,filesAnnotation3],
listOfAnnotationFileColumns=[[2],[2,3,4,5,6,7],[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]],
listOfOutputFilenames=filesOutput123Annotated
)
Source Code
ANNOVAR Annotation pipeline
- About ANNOVAR: http://www.openbioinformatics.org/annovar/
- Download ANNOVAR http://www.openbioinformatics.org/annovar/download/annovar.latest.tar.gz more information: http://www.openbioinformatics.org/annovar/annovar_download.html
- ANNOVAR is already installed and configured in gbicdev: /data/home/data/alex/ANNOVAR/
- Download (for both hg18 and hg18 release) annotation files (most of them from UCSC):
- Gene based: http://www.openbioinformatics.org/annovar/annovar_gene.html
- Region based: http://www.openbioinformatics.org/annovar/annovar_region.html
- Filter based: http://www.openbioinformatics.org/annovar/annovar_filter.html
- These files are have already been downloaded in gbicdev: /data/home/data/alex/ANNOVAR/humandb_hg18/ , /data/home/data/alex/ANNOVAR/humandb_hg19/
- To Annotate a VCF (version 4.0) file, use: http://www.bbmriwiki.nl/svn/SequenceAnnotation/Scripts/ANNOVAR.py This is a wrapper for the ANNOVAR tool.
- Usage:
- Example: python ANNOVAR.py --pathToANNOVAR /data/home/akanterakis/tools/ANNOVAR/annovar --VCFFilename /data/home/data/pdeelen/Celiac40ExomsProject/SequenceData/sequence0605_41.index_hg18.snps.filtered.vcf --outputFilename /data/home/data/alex/ANNOVAR/annotated/output.txt --outputDirectory /data/home/data/alex/ANNOVAR/annotated/ --buildver hg18 --annotationDirectory /data/home/data/alex/ANNOVAR/humandb_hg18/ --geneBasedAnnotations refgene,knowngene,ensgene --regionBasedAnnotations band,segdup,dgv,gwascatalog --filterBasedAnnotations snp130 --customAnnotations kantale
- Options:
- --pathToANNOVAR: The path to the installed ANNOVAR tool
- --VCFFilename: The path to the VCF file to be annotated
- --outputFilename: The output annotated file
- --outputDirectory: output directory
- --buildver: Could be either hg18 or hg18
- --annotationDirectory: The directory where the annotation files are
- --geneBasedAnnotations: Gene Based Annotations according to ANNOVAR (coma separated no spaces)
- --regionBasedAnnotations: Region Based Annotations according to ANNOVAR (coma separated no spaces)
- --filterBasedAnnotations: Filter Based Annotations according to ANNOVAR (coma separated no spaces)
- --customAnnotations: Custom annotations. These should be GFF3 files http://www.sequenceontology.org/gff3.shtml. The list of files should be comma separated no spaces
- --dummy: If set True, the script will not do anything just print the script commands. Default value: False
- --
Related work
- http://www.svaproject.org/
- http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1001074
Links and resources
Attachments (2)
-
AnnotateIntersectionOverview.zip (760.6 KB) - added by 15 years ago.
Netbeans project for Annotation tool
-
lib.zip (693.7 KB) - added by 15 years ago.
lib for AnnotateIntersectionOverview?
Download all attachments as: .zip
