Learn from TCGA data and other public datasets

The Cancer Genome Atlas (TCGA) is one of the largest and most complete cancer genomics datasets available. However, at more than 2.5 petabytes in size, TCGA is challenging to use. It requires large storage facilities to house, and high performance computation capacity to process. These concerns are common to many large public datasets. Prior to the launch of the CGC, in order for researchers to compute over a large dataset, or analyze their own data alongside it, they had download the dataset to their own hardware. 

Analyze Public data immediately and securely

The CGC allows researchers to immediately and securely access public data on the cloud, including raw and processed data from whole genome, whole exome, RNA, microRNA, bisulfite sequencing, proteomics and array-based studies. Both Open Access and Controlled Access data is available. 

Explore public data using Smart queries

Researchers using the CGC can search for cases and data by their associated clinical metadata, and use a visual case explorer to browse the mutation status and expression levels of a gene in all patients with a particular disease. They can then recall all files associated with these patients, filter further by metadata and execute any analyses over them.

Additionally, the Data Browser feature on the CGC allows researchers to quickly and easily search across more than 100 different properties to find exactly the data they are interested in. The Data Browser queries a rich knowledge base containing more than 140 clinical, biospecimen and analytical properties that can be used to describe cancer omics data, like TCGA or CPTAC and others.

all you need is an internet connection

The CGC democratizes cancer research. Scientists anywhere with an internet connection can manipulate and compute on large cancer datasets to further their research. There is no need to provision, set up and maintain servers for storage and computation, and no time or bandwidth is spent waiting for data to download.