ANALYZE PETABYTES OF PUBLIC CANCER DATA IMMEDIATELY AND SECURELY

ALL YOU NEED IS AN INTERNET CONNECTION!

With the ease and lower cost of sequencing technologies, there has been an explosion of ‘omics data produced. This has resulted in a cumulative number of genomes, exomes, transcriptomes, etc. In combination with proteomics and images, these datasets require an enormous amount of storage facilities to house, and high performance computation capacity to process and analyze it. Prior to the launch of the CGC, in order for researchers to compute over a large dataset, or analyze their own data alongside it, they had download the dataset to their own hardware or high performance computing cluster. 

The CGC allows researchers to immediately and securely access public data on the cloud, including raw and processed data from whole genome, whole exome, RNA, microRNA, bisulfite sequencing, proteomics and imaging studies. Both Open Access and Controlled Access data is available. 

Data Browser example using TCGA data.

The Data Browser feature on the CGC allows researchers to quickly and easily search across more than 100 different properties to find exactly the data they are interested in. Researchers using the CGC can search for cases and data by their associated clinical metadata, and use a visual case explorer to browse the mutation status and expression levels of a gene in all patients with a particular disease. They can then recall all files associated with these patients, filter further by metadata and execute any analyses over them.

The CGC democratizes cancer research. Scientists anywhere with an internet connection can manipulate and compute on large cancer datasets to further their research. There is no need to provision, set up and maintain servers for storage and computation, and no time or bandwidth is spent waiting for data to download