Changes between Version 3 and Version 4 of CoverageAnalysisPipeline


Ignore:
Timestamp:
Oct 27, 2010 8:22:03 PM (14 years ago)
Author:
Barbera van Schaik
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CoverageAnalysisPipeline

    v3 v4  
    22
    33= Coverage Analysis Pipeline =
    4 TODO. Suggested parties to take this up: Antoine van Kampen, Barbera van Schaik, Silvia D Olabarriaga, Mark Santcroos, AMC
     4Antoine van Kampen, Barbera van Schaik, Silvia D Olabarriaga, Mark Santcroos, AMC
    55
    6 = Workflows =
     6= Coverage per base =
    77
    8 == Create grid directory and change permissions ==
     8[[Image(CoveragePerBase.png)]]
    99
    10 [[Image(CreateGridDirectory.png)]]
    11    * Creates a directory on the LFC
    12    * Changes the permissions such that it is in-accessible to the group and others
     10'''Description:''' Calculates coverage per base. Creates a text file which you can use to make a histogram (coverage versus frequency)
    1311
    14 == Create a BWA index on database ==
     12'''Input:'''
     13* !SortedBamFile, a sorted bam file
     14* !ChromInfoTxt, e.g. the chromInfo.txt from the UCSC database
    1515
    16 [[Image(bwaIndexDatabase.png, 50%)]]
     16'''Output:'''
     17* coverageHistogram: summary of coveragePerBase. An overview about the genome coverage of the sequence experiment. Load this into excel/calc/gnuplot/some-other-program to make a graph
    1718
    18 Gunzip fasta file. Build BWA index. Tar-gzip the results.
    19 
    20 == Split fastq file ==
    21 
    22 [[Image(splitFastq.png, 50%)]]
    23 
    24 Splits a large fastq file (gzipped) into several smaller files with the unix command 'split'. The results are uploaded to the directory that is specified in 'gridOutputDir'
    25 
    26 == Alignment with BWA on each split file ==
    27 
    28 [[Image(BWAparam.png, 50%)]]
    29 
    30 Runs BWA with adjustable parameter settings.
    31    * Matches sequence reads to a reference database
    32    * Convert sai to sam
    33    * Convert sam to bam
    34    * Sort bam file
    35    * Index sorted bam file
    36    * Tar-gzip all results. Also the intermediate files
    37 
    38 == Merge bam files ==
    39 
    40 [[Image(MergeIndexSNPcall.png, 50%)]]
    41 
    42    * Downloads all bai, bam, sam and tar.gz files from the gridInputDirectory
    43    * Gunzip tar the tar.gz files if they are present
    44    * Gunzip the reference file (fasta format)
    45    * Merge all _sorted.bam files
    46    * Build index on this merged file
    47    * Call SNPs and make selection. Output in pileup format.
    48    * Convert pileup format to bed format
    49 
    50 == SNP calling with varscan, determine coverage ==
    51 
    52 [[Image(Coverage_Varscan_BaseCoverage.png)]]
    53 
    54    * Creates a pileup file (with samtools pileup -f) Sends the output to Varscan. Calls SNPs, indels and copy number variations.
    55    * Calculates coverage per 50kbp
    56    * Calculates coverage per base
     19'''Status:''' Implemented on grid. Source code is made available.