Release notes
Recently published apps
MaxQuant is a tool for quantitative proteomics, designed for analysing large mass-spectrometric data. It takes files with high-resolution, quantitative MS data and produces information about quantification of proteins and PTMs. It can be used for analysing data derived from any major relative quantification techniques (Label-free quantification (LFQ), MS1-level labelling and isobaric MS2-level labelling). Furthermore, it provides quantification algorithms for all common forms of tandem mass (TMT) and isobaric tags for relative and absolute quantitation (iTRAQ) labelling (including higher-plex TMT and multinotch MS3 quantification).
GENESIS Association Results Plotting creates Manhattan and QQ plots from GENESIS association test results with additional filtering and stratification options available. This app with it’s default options is the part of a GENESIS Association testing workflows, however after the association testing is completed users can fine-tune the Manhattan and QQ plots by running this app separately.
Release notes
Recently published apps
The following apps were upgraded to CWL1 and had their versions updated as well:
GATK
Picard
VEP toolkit and workflow
Release notes
Foundation Medicine data available on the CGC
Foundation Medicine dataset has been made available and is accessible through the Data Browser on the CGC. The dataset contains genomic profiling data from approximately 18,000 adult patients with a diverse array of cancers that underwent genomic profiling.
Release notes
GDC Datasets version update
As of March 17, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 28.0.
Release notes
Recently published apps
GATK Somatic SNVs and INDELs (Mutect2) 4.1.9.0 can be used to detect SNVs and INDELs in one or more tumor samples from a single individual, with or without a matched normal sample. Assembly implies whole haplotypes and read pairs, rather than single bases, as the atomic units of biological variation and sequencing evidence, improving variant calling.
GATK Somatic Create Mutect2 Panel of Normals 4.1.9.0 workflow creates a panel of normals (germline and artifactual sites) for use in other GATK workflows. It takes multiple normal sample callsets produced by GATK Somatic SNVs and INDELs 4.1.9.0 (Mutect2 workflow) tumor-only mode (although it is called tumor-only, normal samples are given as the input) and collates sites present in two or more samples into a sites-only VCF.
Both workflows are composed in reference to the official GATK’s WDLs.
Release notes
Improved project organization with project tags
In order to improve the organization and findability of projects, project tags have been introduced to the CGC.
Project Admins can now assign tags to projects via the API or through the visual interface. Such tags can be used for filtering purposes when browsing all projects, for projects categorization, and for general custom organization of projects.
The maximum number of tags for a single project is 15, while the maximum number of characters in a single tag is 36.
PDC data update on the CGC
PDC data on the CGC has been updated with the following PDC Data Releases:
V1.0.24 (February 5, 2021)
V1.0.22 (January 5, 2021)
V1.0.21 (December 15, 2020)
See more information about the history and contents of each PDC data update on the CGC.
GDC Datasets version update
As of February 22, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 27.0.
Release notes
Recently published apps
The following tools were updated to their latest versions and upgraded to CWL1.x:
HISAT2-StringTie workflow
StringTie
Hisat2
Trimmomatic
Tabix
SBG FASTQ Merge
The following new apps were published, in CWL1.x:
Exomiser 12.1.0 - tool for prioritizing variants from WES and WGS data.
VEP Slivar Trios Rare Diseases Analysis workflow - analyzes WES and WGS family variants.
Clustering and Gene Marker Identification with Seurat 3.2.2 - clustering and gene marker identification analysis starting from gene-cell UMI or read counts.
xCell 1.3 - tool for cell type enrichment analysis, which takes gene expression data and performs analysis for 64 immune and stromal cell types.
MBASED 1.18.0 tool - used for performing allele specific expression analysis.
MBASED workflow - based on the MBASED tool, with added phasing and VEP annotation, the workflow allows for easier running of allele specific expression analysis.
elPrep 4.1.6 - high-performance tool for preparing SAM/BAM files for variant calling in sequencing pipelines, which can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, calculating and applying base quality score recalibration, etc.
Kraken2 2.0.9 - taxonomic sequence classifier that assigns taxonomic labels to DNA sequences.
Bracken 2.5 - uses the taxonomic assignments made by Kraken/Kraken2, along with information about the genomes themselves, to estimate abundance at the species/genus level, or above.
Release notes
RAS-CRDC Integration Phase 1 completed
The Researcher Auth Service (RAS), sponsored by The Office of Data Science Strategy, is a service provided by NIH's Center for Information Technology (CIT) to facilitate access to NIH’s open and controlled data assets and repositories in a consistent and user-friendly manner.
The RAS initiative is advancing data infrastructure and ecosystem goals defined in the NIH Strategic Plan for Data Science. RAS has adopted the Global Alliance for Genomics and Health (GA4GH) standards for integration of researcher-focused applications and data repositories over the OIDC platform.
The goal for this effort is to coordinate all cloud stacks and use RAS identically across systems. The NCI CRDC (Cancer Research Data Commons) stack was chosen for the pilot phase to create a phased approach that should achieve the larger goals of federated data access using GA4GH Passports, with a focus on how this fits in with NIH data in general.
Phase 1 is now completed introducing a change to the login flow when using eRA Commons:
When choosing login with eRA Commons on the CGC, you will now be redirected to the NIH RAS login screen instead of iTrust.
Other than the login flow change, user experience on the CGC remains the same.
Recently published apps
GATK Broad Best Practice Variant Calling From uBAM - This workflow presents two different BROAD Best Practice workflows incorporated into one - BAM processing and variant calling.
Functional Equivalence WGS - This workflow processes WGS data according to the functional equivalence standard.
Release notes
New password validation rules
In order to maintain a high level of security and prevent unauthorized access to the CGC, we are introducing password checking against a database of commonly used passwords and those that were compromised in data breaches across the Internet, which is why they are considered unsafe. If the entered password exists in this database, you will need to use a different one.
Additionally, please note that the entire password validation process takes place within the Seven Bridges infrastructure, which ensures an additional level of security as there are no third-party services involved. This password validation mechanism applies when trying to perform the following actions:
Sign up for a new account.
Set up a new password to replace an expired one.
Change the account password.
Please note that this does not apply to accounts using external login providers.
Release notes
Task queueing improvements
We have made the following changes to the task queueing process that should improve the queueing logic and contribute to faster completion of initialized tasks and analyses:
If you reach your parallel instance limit with running tasks, and there are both tasks and Data Cruncher analyses waiting in the queue, Data Cruncher analyses will be first to execute once instances become available.
If the parallel instance limit is reached with tasks that are being executed, when an instance becomes available it will first be allocated to running multi-instance or scattered tasks if they need additional instances. If there are no such tasks, the instance will be allocated to other task(s) that are next up in the queue. This will enable faster completion of multi-instance and scattered tasks and help avoid breaks in their execution.
Recently published apps
GENESIS LocusZoom visualizes association testing results using the LocusZoom standalone software. This app is a wrapper around LocusZoom standalone software to enable it to work with outputs of GENESIS association pipelines. The main goal of this app is to visualize results of GENESIS Single Variant Association Test, however regions from sliding window or aggregate tests with p-values below a certain threshold can be displayed in a separate track.
SBG Loci Snapshoter generates screenshots of specific regions across all aligned files provided as inputs. It utilizes the IGV batch functionality to create PNG images of desired loci across multiple samples. The main driver of developing this tool was doing a post-association visualisation of associated variants across a subset of CRAM files used to obtain those variants.
Release notes
CDS integration on the CGC
The Cancer Data Service (CDS) is a data repository under the NCI's Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. Its data is stored in the Database for Genotypes and Phenotypes (dbGaP) database provided by the National Center for Biotechnology Information (NCBI). CDS hosts datasets that contain controlled access data, with access permissions being controlled by dbGaP. This release brings 3 CDS datasets to the CGC: GECCO, PPTC and LCCC-1108 and enables researchers to easily use CDS data on the CGC.
Release notes
Data Cruncher Stability and Usability Improvements
Your experience with Data Cruncher just got better thanks to the following improvements:
Use the full potential of RStudio as it is now officially out of the BETA stage and its stable release is available in Data Cruncher.
Maintain full control over your workspace integration capabilities in a more secure environment - your Data Cruncher sessions are now run on a separate domain providing even better security isolation and privacy control of your favorite third-party integrated development environments.
Have a better insight into your session initialization phase with a more informative loading experience.
Release notes
Recently published apps
Strelka2 Somatic workflow and Strelka2 Germline tool have been published to the CGC in CWL1.0. Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. For better calling results structural variant caller, Manta has been added to the Somatic Workflow.
Release notes
Upload files directly through the visual interface
Uploads just became way easier! In order to enable easy and convenient small-scale file uploads for our users, we have added a new functionality that allows you to upload files directly through the CGC's visual interface. To upload files from your local storage, navigate to the desired project and click the Add files button. You will notice a new tab called Your Computer, which will, apart from file upload itself, provide all standard upload-related features such as naming conflicts resolution, file tagging and tracking of upload progress. Learn more.
Release notes
GA4GH WES and DRS support
Through its engagement in GA4GH (The Global Alliance for Genomics and Health), Seven Bridges actively works with platform development partners and industry leaders to develop standards that will facilitate interoperability.
The GA4GH Cloud Work Stream helps the genomics and health communities take full advantage of modern cloud environments. Its initial focus is on 'bringing the algorithms to the data', by creating standards for defining, sharing, and executing portable workflows. Standards under discussion include workflow definition languages, tool encapsulation, cloud-based task and workflow execution, and cloud-agnostic abstraction of data access.
CGC provides support for the following standards:
WES API
The Workflow Execution Service (WES) API describes a standard programmatic way to run and manage workflows. Having this standard API supported by multiple execution engines will let people run the same workflow using various execution platforms running on various clouds/environments.
The following API paths are available as a part of the Seven Bridges implementation of WES API:
DRS API - AuthN/Z Update
The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of where it's stored and how it's managed.
With this release, authN/Z method is changed to better reflect specification recommendations:
All API requests need to have the HTTP header X-SBG-Auth-Token
which you should set to your authentication token.
The following API paths are available as part of the Seven Bridges implementation of DRS API:
Recently published apps
The updated GENESIS apps are now available in our public apps gallery. The new release includes:
New Docker image v.2.8.1.
Updated input and output descriptions.
Comprehensive benchmarking included in the apps description.
Standard output included in the task logs.
Other minor changes.
Release notes
Recently published apps
The EPACTS 3.4.2 toolkit has been published in CWL1.0. EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.
Release notes
Recently published apps
Plink1.90 and Plink2.0 have been published in CWL1.0. Plink is a widely used open-source tool for genome-wide association studies and research in population genetics. It can rapidly manipulate and analyze large datasets in their entirety; data management, summary statistics, population stratification and association analysis are only some of Plink's domains of function.
Release notes
Recently published apps
MultiQC has been upgraded to CWL1.0, updated to the latest version, and published on all environments.
Release notes
CWL 1.1 Now Available on the CGC
CGC now supports Common Workflow Language (CWL) v1.1. The new version of CWL brings some minor improvements, most notably:
Limit for execution time of a command line tool. If execution time exceeds the defined limit, the tool will fail in order to prevent additional costs.
Option to disable memoization for a specific tool. If you want to use memoization for a workflow run, but want to skip memoization in a specific tool, CWL1.1 introduces a feature that allows this.
Ability to declare optional secondary files. A task is not prevented from running if an optional secondary file is not available in the project.
And quite a few minor features and improvements. For the detailed change log please see the CWL CommandLineTool specification and the CWL Workflow specification.
Release notes
Deprecation of Previous Generation Instance Types
To further optimize user workloads, we have decided to deprecate some of the older Previous Generation AWS instances.
We are removing support for the following AWS instance types from the CGC:
c3.large
c3.xlarge
c3.2xlarge
c3.4xlarge
c3.8xlarge
m3.medium
m3.large
m3.xlarge
m3.2xlarge
r3.large
r3.xlarge
r3.2xlarge
r3.4xlarge
r3.8xlarge
If you have an app, task or Data Cruncher analysis with one of these types set as instance type hint, it will automatically be migrated to use the most appropriate newer instance type.
If you are explicitly setting some of these types via the API, please update your scripts before September 15th.
Please see the full list of supported instance types and the official recommendation for Upgrade Paths from AWS.