Version 30 (modified by 14 years ago) (diff) | ,
---|
Table of Contents
Groningen cluster
People UMCG: Morris, Freerk, more?
Description Description here about code template and automatic PBS script generation. Job submission/monitoring
Port applications to Dutch Life Science Grid
People
- AMC: Antoine van Kampen, Barbera van Schaik, Silvia D Olabarriaga, Mark Santcroos
- Sara/BiGGrid: Tom Visser, more?
- UMCG: Morris, Freerk
Description Software is going to be implemented as workflow components. The workflows will run on the Dutch life science grid.
- Information about the infrastructure: http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/
- Getting started: http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/GettingStarted
Implemented workflow components at AMC
This list of workflow components are already available. We can expand it with Pindel and (parts of) the GATK pipeline.
- Splitting of fastq files
- Building a BWA index on the genome sequence (base space and color space)
- BWA for shotgun reads (base space and color space) It is possible to do parameter sweeps. Output is in bam format
- Merge bam results
- Samtools pileup
- Varscan (pileup to snp, indel and cns)
- Bam2coverage creates a UCSC wiggle file to display the genome coverage (per 50kbp)
- Coverage-per-base determines the coverage for every base in the genome and it summarizes the results (coverage versus frequency)
- Annovar (currently working on the implementation). This is a pipeline to annotate variants (gene, dbsnp, hapmap, 1000g, conservation, etc)
Implemented components of the Groningen pipeline A more detailed description will follow later
- BwaIllumina (done) - pe00-bwa-align-pair1.ftl, pe01-bwa-align-pair2.ftl, pe02-bwa-sampe.ftl, pe03-sam-to-bam.ftl, pe04-sam-sort.ftl
- MarkDuplicates (done) - pe05-mark-duplicates.ftl
- PicardQC (partly done) - pe04b-picardQC.ftl. Didn't get the R environment up and running yet, so the .pdf .hist and .bamindexstats can not be produced yet. Will continue with the other components and fix this later. Attachment contains info about the required R packages.
- GatkGenerateIntervalFile (in progress) - see e-mail Freerk on Dec 13, 2010
- ReAlign (in progress) - pe06-realign.ftl
To be implemented
- The components of the Groningen pipeline that not implemented as a workflow component yet
- Pindel
Data access rights
To ensure that the most limited group of people has access to the data we have created a subgroup "gvnl" within the "vlemed" Virtual Organisation (VO). For people to become part of this group, it is required that they have a Grid certificate and that they are part of the "vlemed" VO. On the following page there is information on how to get a certificate, how to get into the "vlemed" VO: http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/EBioInfra#Access
For more information about data access see http://www.bioinformaticslaboratory.nl/twiki/bin/view/EBioScience/DataManagement
Things to address
- Available disk space on the grid storage elements / worker nodes
Workflow execution
On mini pilot. Split lines-per-file: 8,000,000. Start: 18-12-2010 16:40
Lane | split BWA merge | Comments | Elapsed time (s) |
A4a_L4_HUModqRBUDIBAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100826_I176_FC804GCABXX_L6_HUModqRBUDIBAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/A4a_HUModqRBUDIBAPE_L4/mergebam lfc] | 22740 |
A4a_L6_HUModqRADDIAAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100826_I124_FC20813ABXX_L7_HUModqRADDIAAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/A4a_HUModqRADDIAAPE_L6/mergebam lfc] | 37750 |
A4a_L7_HUModqRBVDIBAPE | failed | ||
A4b_L3_HUModqRAFDIBAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100829_I168_FC804GBABXX_L1_HUModqRAFDIBAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/A4b_HUModqRAFDIBAPE_L3/mergebam lfc] | 32256 |
A4b_L6_HUModqRBTDIBAPE | failed | ||
R2A _L1_HUModqRADDIBAPE | failed | ||
R2A _L1_HUModqRAFDIBAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100809_I125_FC2083DABXX_L1_HUModqRAFDIBAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/R2A_HUModqRAFDIBAPE_L1/mergebam lfc] | 35819 |
R2A _L5_HUModqRAEDIAAPE | failed | ||
R2B _L3_HUModqRBTDIAAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100809_I174_FC2085PABXX_L6_HUModqRBTDIAAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/R2B_HUModqRBTDIAAPE_L3/mergebam lfc] | 23754 |
R2B _L4_HUModqRBUDIAAPE | failed | pair 1 not in correct gzip format? | |
R2B _L6_HUModqRBTDIBAPE | failed | ||
R2C _L2_HUModqRBUDIBAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100810_I171_FC20828ABXX_L3_HUModqRBUDIBAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/R2C_HUModqRBUDIBAPE_L2/mergebam lfc] | 40763 |
R2C _L2_HUModqRBVDIBAPE | failed | ||
R2C _L7_HUModqRBVDIAAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100810_I128_FC2087PABXX_L7_HUModqRBVDIAAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/R2C_HUModqRBVDIAAPE_L7/mergebam lfc] | 39374 |
Unknown_L6_HUModqRBUDIAAPE | done | [lfn://lfc.grid.sara.nl:5010/grid/vlemed/gvnl/Data/split_100804_I124_FC201RNABXX_L6_HUModqRBUDIAAPE_1/bwa_gatk_index_basespace_human_g1k_v37.fasta.tar.gz_bwa_4_threads.txt/Unknown_HUModqRBUDIAAPE_L6/mergebam lfc] | 30307 |
Note: The Merge component is red (marked as failed) in the finished workflows. The bam files where produced successfully.
BWA alignment on mini pilot without splitting the data
- 19-12-2010 14:00 running
PicardQC on finished bam files
- 20-12-2010 11:30 done - elapsed time (10 bam files) 8730 s
- Info runtime and used disk space (ods)
Coverage-per-base on finished bam files
- 20-12-2010 12:40 running
Mark-duplicates on finished bam files
- 20-12-2010 12:47 done - elapsed time 10614 s
Alternatives
Clusters
- Groningen
- Leiden
- Huygens
- Lisa
- Philips
- DAS
Grid
Attachments (3)
-
r-environment.txt (134 bytes) - added by 14 years ago.
info about R packages
- log-picardqc-20101220.ods (17.1 KB) - added by 14 years ago.
- log-fastqc20110423.xls (118.0 KB) - added by 14 years ago.
Download all attachments as: .zip