Changes between Version 1 and Version 2 of ProjectOverview


Ignore:
Timestamp:
Sep 5, 2010 8:41:22 PM (14 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ProjectOverview

    v1 v2  
    11= Project Overview =
    2 The BBMRI/Bioinformatics rainbow project was awarded based on the attached project proposal:
     2The BBMRI/Bioinformatics rainbow project was awarded based on the attached project proposal.
    33
    4 = Project aim =
    5 The next stage of epidemiological and genetic research will depend critically on large collections of high quality samples and data, also known as biobanks. This ambitious BBMRI-NL project aims to facilitate large scale data enrichment and data pooling between Dutch biobanks which will allow leading participation in the next generation of international etiological research. This will be achieved via harmonization and enrichment of bioinformatics databases, methods, models, software and tools, in particular focusing on high throughput analysis of sequencing and genome-wide association studies in the context of the ‘Genoom van Nederland (GvNL)’ rainbow project.
    6 
    7 
    8 == Background ==
    9 In the past decades the Dutch university medical centers and research institutes have collected broad and deep collections of over 400.000 individuals, not including major new initiatives like the !LifeLines project which will follow 165.000 individuals throughout the next 30 years. Currently ~200 organizations are in charge of the Dutch biobanks ranging from large and broad (inter-institute) cohorts to small and deep departmental boutique disease biobanks. __Each of these biobanks has collected for each participating individual a unique set of materials, including tissue-derived samples (like blood plasma, serum, urine and/or DNA among others) and/or phenotypic data in the form of responses to questionnaires, results of measurements and information from hospital information systems.__
    10 
    11 
    12 == Challenges ==
    13 The science of biobanking involves major bioinformatics and IT challenges at multiple levels, and it is now clear that significant development is required to enable multiple biobanks to work together in a highly effective and integrated way. Novel high throughput measurement methods like SNP-chip based genome-wide association studies (GWAS) and next generation sequencing (NGS) enable massive genetic and molecular profiling of samples at an unprecedented rate. __Suitable software infrastructures are needed to enable integrated analysis of all these data with sufficient statistical power to unravel the complex interplay of genetic and environmental factors in determination of human health and disease.__
    14 
    15 
    16 == Mission ==
    17 This joint BBMRI-NL and NBIC project brings together a team of leading bioinformatics researchers on a mission to remove technical barriers to the integration and exploitation of the wealth of phenotype and genotype data available in the biobank community. This will involve the research & generation of suitable software protocols, models, formats, databases, hardware and tools building on, and in collaboration with, national NBIC/!BioAssist biobanking, sequencing and interoperability task forces, Parelsnoer, Mondriaan, CTMM and !LifeLines/Target and international efforts BBMRI-EU, ELIXIR, 1000 Genomes project, European Bioinformatics Institute, EU-GEN2PHEN (genotype to phenotype), P3G (Public Population Project in Genomics: Genome Canada and Genome Quebec), EU-GENECURE (GENomic !StratEgies for Treatment and Prevention of Cardiovascular Death in Uraemia and End-stage REnal Disease: FP6), ENGAGE (European Network for Genetic and Genomic Epidemiology: FP7), EU-BioSHARE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union), EU-NMD-chip (neuromuscular diseases chip, FP7), EU-!TechGene (echnological innovation of high throughput molecular diagnostics of clinically and molecularly heterogeneous genetic disorders: FP7) and GEFOS (Genetic Factors of !Osteoprosis:FP7). __The aim is to harmonize data management, exchange and protocols for existing data within BBMRI-NL and to enrich Dutch biobanks with new models, software and tools for next generation data with scalable data archives, flexible and large scale processing pipelines and easy-to-connect systems for data exchange.__
    18 
     4Highlights:
    195
    206== Expected output ==
    217This project aims to produce the bioinformatics resources needed by BBMRI-NL participating biobanks and rainbow and complementation projects, most notably in the context of Genoom van Nederland:
    228
    23 1.       Sequence data management, QC and analysis pipelines to produce and share a Dutch catalog of variants.
     91.      Sequence data management, QC and analysis pipelines to produce and share a Dutch catalog of variants.
    2410
    25 2.       GWAS data management, QC and imputation to produce a Dutch GWAS control cohort
     112.      GWAS data management, QC and imputation to produce a Dutch GWAS control cohort
    2612
    27 3.       Dutch (inter)national biobank catalog and data exchange formats
     133.      Dutch (inter)national biobank catalog and data exchange formats
    2814
    29 4.       Scalable and easy to maintain software and web access tools underlying 1-3.
     154.      Scalable and easy to maintain software and web access tools underlying 1-3.
    3016
    3117All these resources will be made publically available both as __centralized, secured, web accessible national services__, i.e. central hubs assembled in partnership to support the rainbow projects, as well as __downloadable and customizable ‘tools-in a-box’__ meant for local installation by biobanks and their local projects (local hubs). This project will develop in parallel the scientific, professional and physical infrastructures needed to effectively communicate expertise, procedures and tools between all Dutch biobanks as well as the provision of bioinformatics experts building on the infrastructure organized in the Netherlands Bioinformatics Center (NBIC) !BioAssist program. This group will work in coordination with the BBMRI-NL ethical-legal working group to develop a code of practice and guidelines for large scale harmonized data pooling and for the use of data from multiple biobanks.
    32 
    33  
    34 
    3518
    3619= 5. Approach =
    3720This project will combine a hub-and-spoke research & development organization to harmonize data between biobanks together with the provision of experts who will provide innovative model-driven software methods to efficiently produce ready-to-use software infrastructures needed by biologists and researchers. This includes:
    3821
     22== Agile hub-and-spoke organization ==
     23At the core of BBMRI there is the vision to develop all resources in a hub and spokes manner such that we maximize use of local expertise and innovation and minimize duplicated efforts and barriers to integration via centralized harmonization and enrichment. The smallest hubs within the Dutch biobank landscape are the individual biobanks, the larger hubs the participating institutes, and the largest hubs are central deployment of key data and analysis resources (which again can connect to pan-European hubs).  This project will mirror this organization to bridge between biomedical researchers, bioinformaticians and hardcore software engineers to ensure the multi-disciplinary interplay needed:
    3924
    40 == Agile hub-and-spoke organization ==
    41 At the core of BBMRI there is the vision to develop all resources in a hub and spokes manner such that we maximize use of local expertise and innovation and minimize duplicated efforts and barriers to integration via centralized harmonization and enrichment. The smallest hubs within the Dutch biobank landscape are the individual biobanks, the larger hubs the participating institutes, and the largest hubs are central deployment of key data and analysis resources (which again can connect to pan-European hubs).  This project will mirror this organization to bridge between biomedical researchers, bioinformaticians and hardcore software engineers to ensure the multi-disciplinary interplay needed:
     25·         A __central engineering team of hardcore programmers__ is responsible for the overarching infrastructure and will ensure harmonization of tools, pipelines and databases between working groups. This group will function as one of the eight NBIC task forces and will meet every week to ensure knowledge and method transfer.
    4226
    43 ·         A __central engineering team of hardcore programmers__ is responsible for the overarching infrastructure and will ensure harmonization of tools, pipelines and databases between working groups. This group will function as one of the eight NBIC task forces and will meet every week to ensure knowledge and method transfer.
     27·         __Participating experts will host programmers and scientific staff to pilot the planned tools and pipelines__ in close support to (their) BBMRI-NL complementation and rainbow projects. These bioinformaticians will be organized in themed working groups as described in appendix 1. Each working group will have a lead programmer that is part of the central engineering team. All members will meet monthly and will have weekly Skype meetings.
    4428
    45 ·         __Participating experts will host programmers and scientific staff to pilot the planned tools and pipelines__ in close support to (their) BBMRI-NL complementation and rainbow projects. These bioinformaticians will be organized in themed working groups as described in appendix 1. Each working group will have a lead programmer that is part of the central engineering team. All members will meet monthly and will have weekly Skype meetings.
    46 
    47 ·         __This project is strongly linked with leading international sister projects to avoid duplicated efforts __and efficiently achieve these aims by having project members participating in, or staying at, institutes like European Bioinformatics Institute (1KG, EGA, !ArrayExpress), Netherlands Bioinformatics Center (NGS, eScience, CWA), projects like EU-GEN2PHEN, EU-BIOSHARE, OMII-UK, ESFRI/ELIXIR, Parelsnoer, Mondriaan, CTMM, TIFN, NPC, NMC, P3G, Human Variome Project and open source collaborations like !ObiBa, MOLGENIS/XGAP, ABEL and Concept Web Alliance.
    48 
     29·         __This project is strongly linked with leading international sister projects to avoid duplicated efforts __and efficiently achieve these aims by having project members participating in, or staying at, institutes like European Bioinformatics Institute (1KG, EGA, !ArrayExpress), Netherlands Bioinformatics Center (NGS, eScience, CWA), projects like EU-GEN2PHEN, EU-BIOSHARE, OMII-UK, ESFRI/ELIXIR, Parelsnoer, Mondriaan, CTMM, TIFN, NPC, NMC, P3G, Human Variome Project and open source collaborations like !ObiBa, MOLGENIS/XGAP, ABEL and Concept Web Alliance.
    4930
    5031== Model driven software ==
    5132Flexible model driven software development as described in Swertz & Jansen (2007) has proven to be an efficient method to rapidly produce harmonized software infrastructures for life scientists while sharing the best models, software and tools notwithstanding large variation in research aims. This project will build and extend upon open source implementations of these methods such as MOLGENIS and Galaxy focusing on:
    5233
    53 ·         Implementing __extensible standard data models and software components__ developed internationally (we co-piloted data models for microarrays, QTLs, GWAS studies [Swertz 2010], and phenotypes in EU consortia like GEN2PHEN and EBI and participated in international GWAS and sequencing initiatives like the 1KG project).
     34·        Implementing __extensible standard data models and software components__ developed internationally (we co-piloted data models for microarrays, QTLs, GWAS studies [Swertz 2010], and phenotypes in EU consortia like GEN2PHEN and EBI and participated in international GWAS and sequencing initiatives like the 1KG project).
    5435
    55 ·         Making tools and protocols reusable in __a user-friendly catalog of bioinformatics tools and workflows __that captures all necessary inputs, outputs, optimization properties and user interactions in models to automatically incorporate existing tools (building or inspired on Taverna and Galaxy).
     36·        Making tools and protocols reusable in __a user-friendly catalog of bioinformatics tools and workflows __that captures all necessary inputs, outputs, optimization properties and user interactions in models to automatically incorporate existing tools (building or inspired on Taverna and Galaxy).
    5637
    57 ·         __Generating automatically from these data and tool models the scalable back-ends and front-ends needed__. This automatic procedure ensures harmonized software results building on industry standard databases for metadata and innovative approaches like cloud computing activities at SARA/Amsterdam, CIT/Groningen and BigGRID/Rotterdam to connect to the scalable compute power and storage needed.
     38·        __Generating automatically from these data and tool models the scalable back-ends and front-ends needed__. This automatic procedure ensures harmonized software results building on industry standard databases for metadata and innovative approaches like cloud computing activities at SARA/Amsterdam, CIT/Groningen and BigGRID/Rotterdam to connect to the scalable compute power and storage needed.
    5839
    59 ·         __Ease finding and integration of resources using semantic and ontology technologies__ such as developed at EBI and NBIC/Concept Web Alliance to build bridges between data and tools, tapping into existing ontologies for data (e.g. HPO for human phenotype ontology) and for analysis protocols to help user and systems developers to bring tools together.
     40·        __Ease finding and integration of resources using semantic and ontology technologies__ such as developed at EBI and NBIC/Concept Web Alliance to build bridges between data and tools, tapping into existing ontologies for data (e.g. HPO for human phenotype ontology) and for analysis protocols to help user and systems developers to bring tools together.
    6041
     42== Ready-to-use databases and tools ‘in-a-box’ that can federate into national resources ==
     43As detailed below in the description of work  section, this project aims to develop novel or incorporate internationally proven key bioinformatics tools, databases, models and software such can be re-used by the smallest hubs (to accommodate and improve local research and complementation projects) up to the larges hubs (supporting rainbow projects, starting with Genoom van Nederland). By sharing the same components between all hubs we provide an effective path to
    6144
    62 == Ready-to-use databases and tools ‘in-a-box’ that can federate into national resources  ==
    63 As detailed below in the description of work  section, this project aims to develop novel or incorporate internationally proven key bioinformatics tools, databases, models and software such can be re-used by the smallest hubs (to accommodate and improve local research and complementation projects) up to the larges hubs (supporting rainbow projects, starting with Genoom van Nederland). By sharing the same components between all hubs we provide an effective path to
     45·         harmonize and enrich available data management, exchange and analysis protocols
    6446
    65 ·         harmonize and enrich available data management, exchange and analysis protocols
     47·         avoid duplicated efforts between local hubs
    6648
    67 ·         avoid duplicated efforts between local hubs
     49·         make it more likely that everyone’s needs are supported
    6850
    69 ·         make it more likely that everyone’s needs are supported
     51·         improve quality because more users test the available bioinformatics infrastructure
    7052
    71 ·         improve quality because more users test the available bioinformatics infrastructure
     53·         preserving flexibility to go beyond standardization and accommodate specific local needs.
    7254
    73 ·         preserving flexibility to go beyond standardization and accommodate specific local needs.
    74 
    75  
    76 
    77 '''__Harmonization / Enrichment__ '''(please tick)''''''
    78 
    79  
    80 
    81 
    82 = 6. Biobanks involved in this grant request =
    83 ''' '''
    84 
    85 '''Name of biobank (1):  '''All biobanks having GWAS data'''  Principal Investigator: '''nvt
    86 
    87 '''Current number of samples: '''400.000'''                    Started in: '''2007 with BBMRI
    88 
    89 '''Short description:'''
    90 
    91 In the Netherlands there are current GWA data available from >85.000 individuals, geographically distributed throughout the country. This material is an excellent foundation to study whether regional differences have any relationship with genetic differences. In the Genoom van Nederland project this has been taken as starting point to select individuals for sequencing. This project will take the result of this sequencing project to elucidate Dutch subpopulations and genetic variation and use this information to enrich available GWAS data.
    92 
    93 '''Content '''(please tick)''':   '''
    94 
    95 '''Phenotypic data''': clinical / anthropomorphic/ lifestyle / environment / biomarkers / medication / family / …
    96 
    97 '''Biomaterials''': DNA / RNA / Plasma / Serum / Urine /
    98 
    99 '''‘Omics’ data''': transcriptomics / proteomics / DNA sequence / GWA / metabolomics /
    100 
    101 '''                '''
    102 
    103 '''7. Added value of the project for BBMRI-NL. '''Please explain and indicate which biobanks, biobank researchers or other stakeholders will profit. ''''''
    104 
    105 The bioinformatics rainbow connects to 6 of the goals as formulated in the so-called “meerjarenplan”:
    106 
    107 1.       Het opzetten van een efficiënte en geïntegreerde nationale infrastructuur die bestaande biobanken in Nederland verbindt en verrijkt
    108 
    109 2.       De koepel en het ‘gezicht’ worden van de Nederlandse biobanken
    110 
    111 3.       Het vormen van een sterke nationale hub voor BBMRI-EU, zowel voor populatiebiobanken als voor klinische biobanken.
    112 
    113 4.       Zorgen voor optimale toegang tot materiaal en gegevens in bestaande biobanken voor wetenschappelijk onderzoek, die optimaal recht doet aan privacy en autonomie van donoren/participanten
    114 
    115 5.       Optimale aansluiting bij en benutting van resultaten uit bestaande initiatieven.
    116 
    117 6.       Het faciliteren van toekomstig multidisciplinair wetenschappelijk onderzoek naar het ontstaan en beloop van multifactoriële aandoeningen, ten bate van nieuwe concepten voor preventie, diagnostiek en behandeling
    118 
    119 
    120 =   =
    121 
    122 = 8. Duration of project:                   =
     55=  =
     56= 8. Duration of project: =
    123573 years
    124 
    125  
    12658
    12759Planning (matching GvNL planning where appropriate)
    12860
    129 Short read archive                                                                                         Month 0 – 8      
     61Short read archive                                                                                         Month 0 – 8
    13062
    131 Biobank catalog pilot                                                                                    Month 0 – 6
     63Biobank catalog pilot                                                                                    Month 0 – 6
    13264
    133 Sequence analysis Phase 1 (GvNL)                                                        Month 4 – 16
     65Sequence analysis Phase 1 (GvNL)                                                        Month 4 – 16
    13466
    135 Harmonized exchange formats                                                                               Month 6 - 24
     67Harmonized exchange formats                                                                              Month 6 - 24
    13668
    137 Establish variation QC and analysis pipeline                                       Month 8 – 20
     69Establish variation QC and analysis pipeline                                      Month 8 – 20
    13870
    139 Sequence analysis Phase 2 (GvNL)                                                        Month 8 – 20
     71Sequence analysis Phase 2 (GvNL)                                                        Month 8 – 20
    14072
    141 Variation catalog/Dutch !HapMap                                                           Month 20
     73Variation catalog/Dutch !HapMap                                                          Month 20
    14274
    143 GWAS data release server                                                                        Month 0 – 12
     75GWAS data release server                                                                        Month 0 – 12
    14476
    145 GWAS QC and imputation protocols                                                     Month 6 – 20
     77GWAS QC and imputation protocols                                                    Month 6 – 20
    14678
    147 Dutch GWAS Control Cohort (DGCC)                                                    Month 12 –24
     79Dutch GWAS Control Cohort (DGCC)                                                    Month 12 –24
    14880
    149 Imputation of available GWA data (GvNL)                                          Month 20 – 30
     81Imputation of available GWA data (GvNL)                                          Month 20 – 30
    15082
    151 Make sequence data available (GvNL)                                                  Month 12 – 30
     83Make sequence data available (GvNL)                                                  Month 12 – 30
    15284
    153 GWAS analysis tools catalog                                                                     Month 12 – 36
     85GWAS analysis tools catalog                                                                    Month 12 – 36
    15486
    155 Web access tools                                                                                           Month 22 – 30
     87Web access tools                                                                                          Month 22 – 30
    15688
    157 Integrated DCGG and Variation catalog web access tools            Month 24 – 36
    158 
    159 ''' '''
     89Integrated DCGG and Variation catalog web access tools            Month 24 – 36
    16090
    16191'''9. Deliverables'''
     
    16393D1 Sequencing
    16494
    165 ·         __Short Read Archive (GvNL)__ – a database and user interface to manage and trace next generation sequencing data, associated sample annotations (metadata) and intermediate- and end-results.
     95·        __Short Read Archive (GvNL)__ – a database and user interface to manage and trace next generation sequencing data, associated sample annotations (metadata) and intermediate- and end-results.
    16696
    167 ·         __Variation analysis and QC pipelines (GvNL)__ – harmonization and enrichment of available processing pipelines for quality control and variation analysis for (exome) re-sequencing projects.
     97·        __Variation analysis and QC pipelines (GvNL)__ – harmonization and enrichment of available processing pipelines for quality control and variation analysis for (exome) re-sequencing projects.
    16898
    169 ·         __Variation catalog/Dutch !HapMap (GvNL) __– release of the enrichment results of variation analysis of the GvNL 1000 genomes as produced using above tools as imputation data source.
    170 
    171  
     99·         __Variation catalog/Dutch !HapMap (GvNL) __– release of the enrichment results of variation analysis of the GvNL 1000 genomes as produced using above tools as imputation data source.
    172100
    173101D2 Genome-wide association analysis
    174102
    175 ·         __GWAS data release server__– database and user interfaces to manage and query GWAS data, in particular to create GWAS (control cohort imputation) data releases.
     103·        __GWAS data release server__– database and user interfaces to manage and query GWAS data, in particular to create GWAS (control cohort imputation) data releases.
    176104
    177 ·         __GWAS CQ and imputation protocols__ – harmonization and enrichment of tools and pipelines to verify and clean GWAS data sets and produce data sets ready for analysis by the researcher.
     105·        __GWAS CQ and imputation protocols__ – harmonization and enrichment of tools and pipelines to verify and clean GWAS data sets and produce data sets ready for analysis by the researcher.
    178106
    179 ·         __GWAS data analysis __– a catalog of established protocols and bioinformatic pipeline implementations thereof for GWAS analysis.
     107·        __GWAS data analysis __– a catalog of established protocols and bioinformatic pipeline implementations thereof for GWAS analysis.
    180108
    181 ·         __GWAS control cohort and DCGG (GvNL) __– collection of BBMRI-NL GWAS data into the DCGG database and release of imputed datasets  using variation catalog produced by GvNL
    182 
    183  
     109·         __GWAS control cohort and DCGG (GvNL) __– collection of BBMRI-NL GWAS data into the DCGG database and release of imputed datasets  using variation catalog produced by GvNL
    184110
    185111D3 Biobank (meta)data finding and exchange
    186112
    187 ·         __Biobank and biobankers catalogue __– central index of biobanks with aggregate metadata on biobank contents (protocols, features observed, optionally (aggregate) data) and semantic search functionality to enable researchers to find biobank(er)s and samples.
     113·        __Biobank and biobankers catalogue __– central index of biobanks with aggregate metadata on biobank contents (protocols, features observed, optionally (aggregate) data) and semantic search functionality to enable researchers to find biobank(er)s and samples.
    188114
    189 ·         __Harmonized data exchange formats __– harmonization of syntaxes / file formats to transfer sample annotations, phenotypic data and molecular data between biobanks and/or central hubs.
     115·        __Harmonized data exchange formats __– harmonization of syntaxes / file formats to transfer sample annotations, phenotypic data and molecular data between biobanks and/or central hubs.
    190116
    191 ·         __Pseudonimization system __– to ensure privacy of participants is protected and legal/ethical requirements are addressed (in collaboration with Parelsnoer).
    192 
    193  
     117·         __Pseudonimization system __– to ensure privacy of participants is protected and legal/ethical requirements are addressed (in collaboration with Parelsnoer).
    194118
    195119D4 Core software platform (support of above to prevent reinvented wheels)
    196120
    197 ·         __Flexible ‘model-driven’ software platform__ – which allows to efficiently produce, configure and maintain all data models, databases, compute services and pipelines needed.
     121·        __Flexible ‘model-driven’ software platform__ – which allows to efficiently produce, configure and maintain all data models, databases, compute services and pipelines needed.
    198122
    199 ·         __Large data platform__ – to harmonize how to deal with the GWAS and NGS data within data archives (storage), algorithms (runtime) and data exchange (network)
     123·        __Large data platform__ – to harmonize how to deal with the GWAS and NGS data within data archives (storage), algorithms (runtime) and data exchange (network)
    200124
    201 ·         __Flexible compute pipeline platform__ - to harmonize how to run large scale analyses without each pipeline having to bother about how difficult it is to run your algorithms on clusters, grids or clouds with suitable user interfaces
     125·        __Flexible compute pipeline platform__ - to harmonize how to run large scale analyses without each pipeline having to bother about how difficult it is to run your algorithms on clusters, grids or clouds with suitable user interfaces
    202126
    203 ·         __Web access tools__ – harmonized user interfaces and programmers interfaces to provide a single point of access to all the resources developed in this project.
    204 
    205  
    206 
    207  
     127·         __Web access tools__ – harmonized user interfaces and programmers interfaces to provide a single point of access to all the resources developed in this project.