Divya Sain Divya Sain

Release notes

Recently published apps

MaxQuant is a tool for quantitative proteomics, designed for analysing large mass-spectrometric data. It takes files with high-resolution, quantitative MS data and produces information about quantification of proteins and PTMs. It can be used for analysing data derived from any major relative quantification techniques (Label-free quantification (LFQ), MS1-level labelling and isobaric MS2-level labelling). Furthermore, it provides quantification algorithms for all common forms of tandem mass (TMT) and isobaric tags for relative and absolute quantitation (iTRAQ) labelling (including higher-plex TMT and multinotch MS3 quantification).

GENESIS Association Results Plotting creates Manhattan and QQ plots from GENESIS association test results with additional filtering and stratification options available. This app with it’s default options is the part of a GENESIS Association testing workflows, however after the association testing is completed users can fine-tune the Manhattan and QQ plots by running this app separately.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

The following apps were upgraded to CWL1 and had their versions updated as well:

  • GATK

  • Picard

  • VEP toolkit and workflow

Read More
Divya Sain Divya Sain

Release notes

Foundation Medicine data available on the CGC

Foundation Medicine dataset has been made available and is accessible through the Data Browser on the CGC. The dataset contains genomic profiling data from approximately 18,000 adult patients with a diverse array of cancers that underwent genomic profiling.

Read More
Divya Sain Divya Sain

Release notes

GDC Datasets version update

As of March 17, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 28.0.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

  • GATK Somatic SNVs and INDELs (Mutect2) 4.1.9.0 can be used to detect SNVs and INDELs in one or more tumor samples from a single individual, with or without a matched normal sample. Assembly implies whole haplotypes and read pairs, rather than single bases, as the atomic units of biological variation and sequencing evidence, improving variant calling.

  • GATK Somatic Create Mutect2 Panel of Normals 4.1.9.0 workflow creates a panel of normals (germline and artifactual sites) for use in other GATK workflows. It takes multiple normal sample callsets produced by GATK Somatic SNVs and INDELs 4.1.9.0 (Mutect2 workflow) tumor-only mode (although it is called tumor-only, normal samples are given as the input) and collates sites present in two or more samples into a sites-only VCF.

Both workflows are composed in reference to the official GATK’s WDLs.

Read More
Divya Sain Divya Sain

Release notes

Improved project organization with project tags

In order to improve the organization and findability of projects, project tags have been introduced to the CGC.

Project Admins can now assign tags to projects via the API or through the visual interface. Such tags can be used for filtering purposes when browsing all projects, for projects categorization, and for general custom organization of projects.

The maximum number of tags for a single project is 15, while the maximum number of characters in a single tag is 36.

PDC data update on the CGC

PDC data on the CGC has been updated with the following PDC Data Releases:

  • V1.0.24 (February 5, 2021)

  • V1.0.22 (January 5, 2021)

  • V1.0.21 (December 15, 2020)

See more information about the history and contents of each PDC data update on the CGC.

GDC Datasets version update

As of February 22, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 27.0.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

The following tools were updated to their latest versions and upgraded to CWL1.x:

  • HISAT2-StringTie workflow

  • StringTie

  • Hisat2

  • Trimmomatic

  • Tabix

  • SBG FASTQ Merge


The following new apps were published, in CWL1.x:

  • Exomiser 12.1.0 - tool for prioritizing variants from WES and WGS data.

  • VEP Slivar Trios Rare Diseases Analysis workflow - analyzes WES and WGS family variants.

  • Clustering and Gene Marker Identification with Seurat 3.2.2 - clustering and gene marker identification analysis starting from gene-cell UMI or read counts.

  • xCell 1.3 - tool for cell type enrichment analysis, which takes gene expression data and performs analysis for 64 immune and stromal cell types.

  • MBASED 1.18.0 tool - used for performing allele specific expression analysis.

  • MBASED workflow - based on the MBASED tool, with added phasing and VEP annotation, the workflow allows for easier running of allele specific expression analysis.

  • elPrep 4.1.6 - high-performance tool for preparing SAM/BAM files for variant calling in sequencing pipelines, which can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, calculating and applying base quality score recalibration, etc.

  • Kraken2 2.0.9 - taxonomic sequence classifier that assigns taxonomic labels to DNA sequences.

  • Bracken 2.5 - uses the taxonomic assignments made by Kraken/Kraken2, along with information about the genomes themselves, to estimate abundance at the species/genus level, or above.

Read More
Divya Sain Divya Sain

Release notes

RAS-CRDC Integration Phase 1 completed

The Researcher Auth Service (RAS), sponsored by The Office of Data Science Strategy, is a service provided by NIH's Center for Information Technology (CIT) to facilitate access to NIH’s open and controlled data assets and repositories in a consistent and user-friendly manner.

The RAS initiative is advancing data infrastructure and ecosystem goals defined in the NIH Strategic Plan for Data Science. RAS has adopted the Global Alliance for Genomics and Health (GA4GH) standards for integration of researcher-focused applications and data repositories over the OIDC platform.

The goal for this effort is to coordinate all cloud stacks and use RAS identically across systems. The NCI CRDC (Cancer Research Data Commons) stack was chosen for the pilot phase to create a phased approach that should achieve the larger goals of federated data access using GA4GH Passports, with a focus on how this fits in with NIH data in general.

Phase 1 is now completed introducing a change to the login flow when using eRA Commons:

  • When choosing login with eRA Commons on the CGC, you will now be redirected to the NIH RAS login screen instead of iTrust.

  • Other than the login flow change, user experience on the CGC remains the same.

Recently published apps

  • GATK Broad Best Practice Variant Calling From uBAM - This workflow presents two different BROAD Best Practice workflows incorporated into one - BAM processing and variant calling.

  • Functional Equivalence WGS - This workflow processes WGS data according to the functional equivalence standard.

Read More
Divya Sain Divya Sain

Release notes

New password validation rules

In order to maintain a high level of security and prevent unauthorized access to the CGC, we are introducing password checking against a database of commonly used passwords and those that were compromised in data breaches across the Internet, which is why they are considered unsafe. If the entered password exists in this database, you will need to use a different one.

Additionally, please note that the entire password validation process takes place within the Seven Bridges infrastructure, which ensures an additional level of security as there are no third-party services involved. This password validation mechanism applies when trying to perform the following actions:

  • Sign up for a new account.

  • Set up a new password to replace an expired one.

  • Change the account password.

Please note that this does not apply to accounts using external login providers.

Read More
Divya Sain Divya Sain

Release notes

Task queueing improvements

We have made the following changes to the task queueing process that should improve the queueing logic and contribute to faster completion of initialized tasks and analyses:

  • If you reach your parallel instance limit with running tasks, and there are both tasks and Data Cruncher analyses waiting in the queue, Data Cruncher analyses will be first to execute once instances become available.

  • If the parallel instance limit is reached with tasks that are being executed, when an instance becomes available it will first be allocated to running multi-instance or scattered tasks if they need additional instances. If there are no such tasks, the instance will be allocated to other task(s) that are next up in the queue. This will enable faster completion of multi-instance and scattered tasks and help avoid breaks in their execution.

Recently published apps

GENESIS LocusZoom visualizes association testing results using the LocusZoom standalone software. This app is a wrapper around LocusZoom standalone software to enable it to work with outputs of GENESIS association pipelines. The main goal of this app is to visualize results of GENESIS Single Variant Association Test, however regions from sliding window or aggregate tests with p-values below a certain threshold can be displayed in a separate track.

SBG Loci Snapshoter generates screenshots of specific regions across all aligned files provided as inputs. It utilizes the IGV batch functionality to create PNG images of desired loci across multiple samples. The main driver of developing this tool was doing a post-association visualisation of associated variants across a subset of CRAM files used to obtain those variants.

Read More
Divya Sain Divya Sain

Release notes

CDS integration on the CGC

The Cancer Data Service (CDS) is a data repository under the NCI's Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. Its data is stored in the Database for Genotypes and Phenotypes (dbGaP) database provided by the National Center for Biotechnology Information (NCBI). CDS hosts datasets that contain controlled access data, with access permissions being controlled by dbGaP. This release brings 3 CDS datasets to the CGC: GECCO, PPTC and LCCC-1108 and enables researchers to easily use CDS data on the CGC.

Read More
Divya Sain Divya Sain

Release notes

Data Cruncher Stability and Usability Improvements

Your experience with Data Cruncher just got better thanks to the following improvements:

  • Use the full potential of RStudio as it is now officially out of the BETA stage and its stable release is available in Data Cruncher.

  • Maintain full control over your workspace integration capabilities in a more secure environment - your Data Cruncher sessions are now run on a separate domain providing even better security isolation and privacy control of your favorite third-party integrated development environments.

  • Have a better insight into your session initialization phase with a more informative loading experience.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

Strelka2 Somatic workflow and Strelka2 Germline tool have been published to the CGC in CWL1.0. Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. For better calling results structural variant caller, Manta has been added to the Somatic Workflow.

Read More
Divya Sain Divya Sain

Release notes

Upload files directly through the visual interface

Uploads just became way easier! In order to enable easy and convenient small-scale file uploads for our users, we have added a new functionality that allows you to upload files directly through the CGC's visual interface. To upload files from your local storage, navigate to the desired project and click the Add files button. You will notice a new tab called Your Computer, which will, apart from file upload itself, provide all standard upload-related features such as naming conflicts resolution, file tagging and tracking of upload progress. Learn more.

Read More
Divya Sain Divya Sain

Release notes

GA4GH WES and DRS support

Through its engagement in GA4GH (The Global Alliance for Genomics and Health), Seven Bridges actively works with platform development partners and industry leaders to develop standards that will facilitate interoperability.

The GA4GH Cloud Work Stream helps the genomics and health communities take full advantage of modern cloud environments. Its initial focus is on 'bringing the algorithms to the data', by creating standards for defining, sharing, and executing portable workflows. Standards under discussion include workflow definition languages, tool encapsulation, cloud-based task and workflow execution, and cloud-agnostic abstraction of data access.

CGC provides support for the following standards:

WES API

The Workflow Execution Service (WES) API describes a standard programmatic way to run and manage workflows. Having this standard API supported by multiple execution engines will let people run the same workflow using various execution platforms running on various clouds/environments.

The following API paths are available as a part of the Seven Bridges implementation of WES API:

Learn more

DRS API - AuthN/Z Update

The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of where it's stored and how it's managed.

With this release, authN/Z method is changed to better reflect specification recommendations:

All API requests need to have the HTTP header X-SBG-Auth-Token which you should set to your authentication token.

The following API paths are available as part of the Seven Bridges implementation of DRS API:

Learn more

Recently published apps

The updated GENESIS apps are now available in our public apps gallery. The new release includes:

  • New Docker image v.2.8.1.

  • Updated input and output descriptions.

  • Comprehensive benchmarking included in the apps description.

  • Standard output included in the task logs.

  • Other minor changes.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

The EPACTS 3.4.2 toolkit has been published in CWL1.0. EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

Plink1.90 and Plink2.0 have been published in CWL1.0. Plink is a widely used open-source tool for genome-wide association studies and research in population genetics. It can rapidly manipulate and analyze large datasets in their entirety; data management, summary statistics, population stratification and association analysis are only some of Plink's domains of function.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

MultiQC has been upgraded to CWL1.0, updated to the latest version, and published on all environments.

Read More
Divya Sain Divya Sain

Release notes

CWL 1.1 Now Available on the CGC

CGC now supports Common Workflow Language (CWL) v1.1. The new version of CWL brings some minor improvements, most notably:

  • Limit for execution time of a command line tool. If execution time exceeds the defined limit, the tool will fail in order to prevent additional costs.

  • Option to disable memoization for a specific tool. If you want to use memoization for a workflow run, but want to skip memoization in a specific tool, CWL1.1 introduces a feature that allows this.

  • Ability to declare optional secondary files. A task is not prevented from running if an optional secondary file is not available in the project.

  • And quite a few minor features and improvements. For the detailed change log please see the CWL CommandLineTool specification and the CWL Workflow specification.

Read More
Divya Sain Divya Sain

Release notes

Deprecation of Previous Generation Instance Types

To further optimize user workloads, we have decided to deprecate some of the older Previous Generation AWS instances.

We are removing support for the following AWS instance types from the CGC:

  • c3.large

  • c3.xlarge

  • c3.2xlarge

  • c3.4xlarge

  • c3.8xlarge

  • m3.medium

  • m3.large

  • m3.xlarge

  • m3.2xlarge

  • r3.large

  • r3.xlarge

  • r3.2xlarge

  • r3.4xlarge

  • r3.8xlarge

If you have an app, task or Data Cruncher analysis with one of these types set as instance type hint, it will automatically be migrated to use the most appropriate newer instance type.

If you are explicitly setting some of these types via the API, please update your scripts before September 15th.

Please see the full list of supported instance types and the official recommendation for Upgrade Paths from AWS.

Read More