Release notes
Recently published apps
BROAD Best Practices RNA-Seq
This workflow represents the GATK Best Practices for SNP and INDEL calling on RNA-Seq data. Starting from an unmapped BAM file, it performs alignment to the reference genome, followed by marking duplicates, reassigning mapping qualities, base recalibration, variant calling and variant filtering. We used Broad’s best practice script in WDL format as a reference to create the BROAD Best Practices RNA-Seq Variant Calling 4.1.0.0 workflow in CWL version 1.0.
BROAD Best Practices Somatic CNV Panel Workflow
BROAD Best Practices Somatic CNV Panel Workflow is used for creating a panel of normals (PON) given a group of normal samples. Using read coverage collected over specified intervals, this workflow creates a panel of normals HDF5 file which is used in BROAD Best Practices Somatic CNV Pair Workflow for standardizing and denoising read counts. This workflow represents a CWL implementation of Broad’s best practice CNV panel WDL workflow.
BROAD Best Practices Somatic CNV Pair Workflow
BROAD Best Practices Somatic CNV Pair Workflow is used for detecting copy number variants (CNVs) as well as allelic segments. Given a tumor and optional matched normal sample, as well as panel of normals (PON) file, this workflow models and calls CNV segments. This workflow represents a CWL implementation of Broad’s best practice CNV pair WDL workflow.
Release notes
Memoization (beta)
When defining task execution settings, you can now enable memoization. Achieve significant time and cost optimization of your project workload by letting the CGC reuse existing results of your previous runs. Memoization can be enabled at project or task level, where the task-level setting overrides the project-level one.
Multiple datasets selection for querying in Data Browser
With multiple dataset selection for simultaneous querying, you are now able to start data querying in Data Browser by selecting more than one dataset.
Improved organization of Public Reference Files
The Public Reference Files gallery has been renamed into Public Files and split into two categories, Public Reference Files and Public Test Files, where the former holds all common reference files, while the latter contains common test samples.
Recently published apps
GDC DNASeq Harmonization Workflow
The GDC DNASeq Harmonization Workflow is developed by the National Cancer Institute's Genomic Data Commons. It is used for harmonization of genomic data for datasets such as The Cancer Genome Atlas (TCGA) and is publicly available on the CGC.
Release notes
Added support for GPU Instances
The first family of GPU instances we’re introducing is Amazon EC2 P2. P2 Instances are powerful, scalable instances that provide GPU-based parallel compute capabilities. Designed for general-purpose GPU compute applications using CUDA and OpenCL, these instances are ideally suited for machine learning, molecular modeling, genomics, rendering, and other workloads requiring massive parallel floating point processing power.
Release notes
Spot Instances enabled by default on project creation
In order to promote execution cost optimization, Spot Instances are now enabled by default when creating a new project through the visual interface or the API, unless you have specifically set otherwise. This setting can later be changed from the project settings page, or overridden per task on the draft task page. Learn more about Spot Instances on the CGC.
Support for asynchronous bulk actions through the API
As a part of adding full support for folders and improving scalability, we have introduced asynchronous file system actions through the API. Currently supported actions are copy and delete, and these are enabled for both files and folders. There are five new API endpoints for async bulk actions which can be used for issuing copy and delete commands and for getting job statuses.
Improved layout of the draft task page
In order to streamline the preparation process for task execution, both file inputs and app settings will now be available as two columns under the same tab named Task Inputs on the draft task page. Spot Instance configuration will be moved to the second tab on the draft task page, named Execution Settings. This tab will also serve as the central and unique location for all settings related to task execution that will be added in the future.
Release notes
Recently published apps
Metagenomics WGS Functional Profiling - HUMAnN2
HUMAnN2 (the HMP Unified Metabolic Analysis Network) is a tool used for efficiently and accurately determining the presence/absence and abundance of metabolic pathways in a microbial community from metagenomic sequencing data. It introduces a novel tiered search algorithm that provides highly accurate profiles for characterized members of microbial communities, with fallback to translated search for uncharacterized members.
Metagenomic WGS Functional Profiling - HUMAnN2 workflow provides a complete functional profiling analysis of input samples, designed to analyze several metagenomics samples in parallel.
Release notes
Data Cruncher - RStudio (beta)
In addition to JupyterLab, Data Cruncher now supports one more development environment, RStudio. You can choose between the two environments when setting up your Data Cruncher analysis.
Also, file saving rules have been deprecated, so all analysis files will be automatically saved in your analysis workspace on the CGC, regardless of their size or extension.
Learn more about Data Cruncher and the available environments from our documentation.
Release notes
Updates to the TCGA, TARGET and CCLE datasets
As part of Seven Bridges' ongoing partnership with the National Cancer Institute (NCI), authorized researchers can access valuable public datasets generated by the TCGA, TARGET, and CCLE initiatives through the CGC. Seven Bridges collaborates with the NCI Genomic Data Commons (GDC) on an ongoing basis to ensure alignment between the datasets available through the GDC and the CGC. In keeping with this, updated versions of the TCGA, TARGET, and CCLE datasets have been released on the CGC. As of February 11, the legacy TCGA and CCLE datasets available through the CGC are fully aligned with those in the GDC Legacy Archive, and the TCGA GRCh38 and TARGET GRCh38 datasets are fully aligned with GDC Data Release 14.0.
Release notes
Folders as task inputs and outputs
When selecting inputs for a task, you will now be able to select an entire folder for input ports that are set up to take folders as input values. This means that such input ports will take all files from the root of the selected folder and its subfolders. Folders can now also be displayed as app outputs, provided that the app itself is configured to produce output data in folder(s). This feature is available for CWL 1.0 apps only.
Release notes
Computation backend improvements
We are making some improvements to our computation backend. These changes impact sbg:draft2 tasks only, mostly bringing some of their behaviors/capabilities in line with CWL 1.0 tasks.
Recently published apps
DeepVariant 0.7.2 is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant is highly accurate, robust, flexible and easy to use. To use DeepVariant on the CGC, simply supply it with reads, reference, and select the desired model to use (WGS or WES).
Release notes
Deprecation of Previous Generation instance types
To further optimize user workloads, we have decided to deprecate some of the oldest Previous Generation AWS instances.
Release notes
Query projects by name in the Python API client
If you are using the sevenbridges-python API client, you are now also able to query projects by project name. When querying with the name parameter, partial matching is performed and the result is returned as a list. See an example in the List Projects section of sevenbridges-python documentation.
Release notes
Folder options for FTP/HTTP(S) import
When importing data from an FTP or HTTP(S) server to the CGC, you are now able to import an entire folder structure in the exact same form as it appears on its source server, or choose to "flatten" the structure and import the files only. Also, the CGC now provides a selection of naming conflict resolution options for FTP/HTTP(S) imports. If an item that is being imported has the same name as an item already present at the target location, you can select whether to skip importing the item, overwrite the existing one or auto-rename the one that is being imported. Learn more.
Release notes
R client updates
The sevenbridges-r API client has been updated with several sets of API calls. Precisely, the new additions include Folders, Enterprise, Bulk (partial), Actions and Markers (Advance Access) APIs. Moreover, the R client now also supports setting execution hints per task run when drafting new tasks.
New file search options
When searching for files within a project, now you have the option to search within the current folder only, without including subfolders and their content in the search. Until now, the default behavior was that the current folder (or project root) and all of its subfolders got included in the search, with the Path column being shown in the search results. This remains the default behavior, but now there is an additional option to search through items from the current path (project root or folder) only, without searching through subfolders.
Release notes
Open apps for editing in web editors more easily
The option to edit apps in the tool or workflow editor on the Platform is now more accessible through the visual interface. When the Edit option is clicked for an app, all sbg:draft-2 apps will be opened in the corresponding web editor by default. However, you still have the option to edit sbg:draft-2 apps in Rabix Composer, our standalone app editor from the Rabix toolkit. For now, Rabix Composer also remains the only editor for CWL 1.0 apps.
Recently published apps
Control-FREEC 11.5
Control-FREEC analyzes copy-number alterations in exome and whole-genome DNA sequencing. This tool computes, normalizes and segments copy number and beta allele frequency (BAF) profiles, then calls copy number alterations and LOH.
Release notes
Define compute resources per task run (API)
When creating a task via the API, you are now able to set instance type (top level) and maximum number of parallel instances for your execution without the need to create a new version of the app.
Folders become a standard API feature
Folders are no longer in the Advance Access stage and are available as a standard feature on the API.
Release notes
Activate spot instances on project creation
When creating a new project through the visual interface, you are now able to set your preference for Spot instance usage in the Create a project dialog. This setting can later be changed from the project settings page, or overridden per task on the draft task page. Learn more about Spot Instances on the CGC.
Release notes
Data Cruncher - JupyterLab Beta
Data Cruncher is now using the latest release of the JupyterLab environment. All available preinstalled libraries have been updated and some new ones have been added.
We have also made more improvements in order to reduce analysis initialization time. It shouldn't take more than four minutes to spin up your Data Cruncher analysis, regardless of the chosen compute resources.
Release notes
Updates to the TCGA, TARGET, and CCLE datasets
As part of Seven Bridges' ongoing partnership with the National Cancer Institute (NCI), authorized researchers can access valuable public datasets generated by the TCGA, TARGET, and CCLE initiatives through the CGC, and Seven Bridges collaborates with the NCI Genomic Data Commons (GDC) on an ongoing basis to ensure alignment between the datasets available through the GDC and the CGC. In keeping with this, updated versions of the TCGA, TARGET, and CCLE datasets have been released on the CGC. As of July 10, the legacy TCGA and CCLE datasets available through the CGC are fully aligned with those in the GDC Legacy Archive, and the TCGA GRCh38 and TARGET GRCh38 datasets are fully aligned with GDC Data Release 11.0.
Release notes
Personal Genome Project UK (PGP-UK) pilot dataset
The Personal Genome Project UK (PGP-UK) pilot dataset includes in-depth multi-omics profiling of thirteen participants, who have been profiled using whole genome sequencing (WGS) of DNA from whole blood, whole genome bisulphite sequencing (WGBS) of DNA methylation from whole blood (WGBS), deep and shallow sequencing of RNA from whole blood using RNA-seq and DNA methylation array profiling of both whole blood and saliva using the HumanMethylation450 BeadChip from Illumina.
The PGP-UK pilot dataset is now available on the CGC under the Public projects tab at the top navigation bar.
Release notes
Set null or empty values for app settings
When defining app settings prior to execution, you are now able to set null or empty values for the available inputs. This is possible using the two new buttons placed next to the inputs under the Define App Settings tab.