The AI Data Readiness Challenge for the NCI CRDC seeks to engage the research community to identify considerations and gaps related to the AI readiness of CRDC data, and to promote best practices regarding how future data can be generated to be compatible with downstream AI applications, including the level of standardization and curation necessary to make data AI-ready, and how to efficiently share CRDC-hosted data for use by the community. This Challenge is part of a larger effort to ensure that CRDC data and resources meet the needs of AI-based research.


Key Dates

  • Registration period
    2/5/2024 — 3/1/2024
    Register your team and project for the challenge at https://www.challenge.gov

  • Challenge period
    3/4/2024 — 3/22/2024

  • Judging period
    3/25/2024 — 4/5/2024

  • Awards announced
    TBD


Computational Platform

The computational platform for the CRDC AI Data Readiness Challenge is the Seven Bridges Cancer Genomics Cloud (CGC). This platform is provided by the National Cancer Institute. Each competitor team will be issued cloud credits to support analysis during the challenge phase.

  • How to sign up for the CGC: Register at https://cgc.sbgenomics.com using the same email address you used at Challenge.gov to register for the challenge. This will ensure timely provision of cloud credits.

  • After you register, you will be added to your competitor team’s Challenge Project. If you are not added to a project space within 48 hours of registering, email support@velsera.com with the subject line “CRDC AI Data Readiness Challenge.

    • This Project space is where you will conduct analyses, and will also serve as your final submission once the challenge ends.

    • The Project will be in "read-only" mode until the challenge period begins.

    • The Project will change to “write and execute” mode during the challenge.

    • The Project will be put back into “read-only” mode when the judging period begins.

  • Before the challenge begins, the CGC will invite registrants to a live, brief training session to assist registrants in navigating the platform.

Below, we’ve assembled some resources to help you make the most of your cloud credits and your challenge entry submission. The first part of these resources concerns what to do before the challenge starts; the second part concerns how to get support during the challenge, and the third part addresses how to ensure your project is ready for submission at the end of the challenge.


Part 1: What to do during the pre-challenge phase:

During Phase One (the Registration Phase), office hours will be held every Wednesday from 1:00 — 2:00 pm Eastern time (link TBD).

Complete the self-paced onboarding lessons

This series of videos teaches the basics for using the Seven Bridges Cancer Genomics Cloud (CGC), powered by Velsera. The CGC is part of NCI’s Cancer Research Data Commons, a cloud-based data science infrastructure that connects data sets with analytics tools to allow researchers to share, integrate, analyze, and visualize cancer research data to drive scientific discovery.

Learn about controlled data access through dbGaP

Access to controlled-access (i.e. protected) data is granted via the database of Genotypes and Phenotypes (dbGaP). This primarily includes raw sequencing data such as BAM or FASTQ files as well as VCF files and protected MAF files. To gain access to these files a user must apply for access via dbGaP to individual projects. Each project has a Data Access Committee (DAC) that will approve or disapprove data access requests. Before gaining access through dbGaP users also need to obtain an eRA Commons ID for authentication purposes.

Learn how to estimate your cloud computing costs

Learning to estimate and manage your cloud costs will prepare you to effectively budget your cloud credits to complete the challenge.

Part 2: Support during the challenge

During the competition, exclusive support will be available to the Challenge participants both live and asynchronously.

  • Challenge Office Hours:

    • Every Monday from 12:00 pm — 1:00 pm and
      Wednesday at 12:00 — 1:00 PM Eastern time (link TBD)

  • Challenge Slack Space:
    Registrants will receive an invite the following Slack channels

    • nci_ai-data-readiness-challenge

    • nci_ai-data-readiness-seven-bridges

In addition, all competitors will have access to the same support available to general users of the CGC. This support includes technical support and guidance using the platform, but does not include bioinformatics consulting services.

Part 3: Preparing your project for submission

The Challenge Project that the CGC team created for you will serve as your final submission.

For the full details on the submission requirements, see the documentation and follow the checklist available at Challenge.gov.

Your Challenge Project will return to Read-Only Mode at
5:00 PM Eastern time on 3/22/2024