The Gen3 technical documentation is moving to a new home at
docs.gen3.org! This will replace much of the technical content from gen3.org. Please check out the new site and provide feedback during the transition period while both are available. After the transition period, gen3.org will remain as a site for high-level content about Gen3 along with community and events information, but all Gen3 technical guides and resources will move to docs.gen3.org.
Gen3 Community Forum Agenda July 6, 2023
Gen3 Community Forum Agenda
July 6, 2023 (US)/July 7, 2023 (AU/NZ)
Gen3 supports a flexible graph-based data model, which can be customized for a wide variety of projects and use cases. At this community event we will hear from several data commons operators on how they have created their dictionaries and about any tools or processes they use for updating and configuring them. The event will include the following presentations:
Introduction to Gen3 Data Models (Michael Fitzsimons, Robert Grossman - Center for Translational Data Science, University of Chicago)
Presentations from Data Commons
Streamlining Gen3 Data Dictionaries: Python Tools and Google Sheets for simple, automated and efficient dictionary development - We will describe our python Gen3 Schema mapping library and how it enables an automated workflow to edit, test, validate and publish Gen3 Data Dictionaries, using a google sheet as input. We will then describe how we applied these tools to develop a data dictionary for the Australian Cardiovascular disease Data Commons.
Marion Shadbolt - Australian BioCommons
Spreadsheet-based data ingest with Gen3 dictionary-based validation - Spreadsheet templates are provided to Aotearoa Genomic Data Repository users for data ingest purposes, because it is a more straightforward user experience than the native submission portal. This talk provides an overview on our particular use cases and motivations, and demonstrates an extensible validation tool used to check metadata captured in a spreadsheet against an arbitrary Gen3 dictionary.
Eirian Perkins - New Zealand eScience Infrastructure (NeSI)
Evolution of the MIDRC Data Model - We will present how the data model for the Medical Imaging Data Resource Center (MIDRC) was created and is maintained. Topics discussed will include a brief introduction to the MIDRC project, considerations for creation of a new data model, how the data model was created and maintained, major changes to the model and how to migrate data, and introduction and maintenance of derived data elements.
Chris Meyer - Center for Translational Data Science, University of Chicago
Versioning, migrations, and data release processes in the Pediatric Cancer Data Commons - The Pediatric Cancer Data Commons (PCDC) supports multiple independent consortia within a single Gen3 instance and currently has data on more than 35,000 patients. In this session, we will discuss the PCDC’s approach to data set versioning, data releases, and data migrations, highlighting some of the operational impacts of our approach.
Brian Furner - Data for the Common Good, University of Chicago