Meta-exon annotation
To create the meta-exon annotation the following steps were taken:
- The exon annotation from Ensembl Biomart v.71 was downloaded. The file contained the following columns:
chromosome, exon start, exon end, Ensembl exon id, Ensembl gene id, gene name, strand.
- All additional contigs (GL*, LRG* etc) were removed, so that only ordinary chromosomes (1-22, X, Y, MT) remained. This was done by a custom script cutStrangeChr.py (see attachment).
- The Biomart file was converted to bed format and sorted by start coordinate:
- Exons were merged using mergeBed tools from BEDTools suite:
- The resulting file was converted to gtf format, retaining the strand information by a custom script mergedBed_to_gtf.py (see attachment).
The final commands to generate the meta-exon annotation were the following:
./cutStrangeChr.py biomart_export.txt | awk 'BEGIN {FS="\t"}; {OFS="\t"}; {if ($7 == "-1") $7 = "-"; else $7 = "+"}; {print $1, $2 - 1, $3, $4 ":" $5 ":" $6, ".", $7}' | sort -k1,1n -k2,2n | mergeBed -nms -d -1 -i stdin > biomart_export.merged.tmp
./mergedBed_to_gtf.py biomart_export.merged.tmp biomart_export.txt | sort -k1,1n -k4,4n > meta-exons_v71_cut_sorted_18-04-14.gtf
Last modified 8 years ago
Last modified on Sep 19, 2016 4:48:45 PM