Release notes
PyImageJ: Python wrapper for ImageJ2 project is public on CGC
The PyImageJ: Python wrapper for ImageJ2 project that serves as a comprehensive tutorial for users interested in leveraging PyImageJ for image analysis and processing is now publicly available on CGC. It features one Data Studio interactive analysis, written in Python, with step-by-step demonstrations and examples showcasing the integrative capabilities of PyImageJ.
UmetaFlow: Untargeted Metabolomics Workflow for Data Processing and Analysis published
We've just published the UmetaFlow: Untargeted Metabolomics Workflow for Data Processing and Analysis Public Project. This project was developed in collaboration with the OpenMS team and serves as a comprehensive tutorial for users interested in metabolomics data analysis. It includes a Data Studio interactive analysis, written in Python, with step-by-step demonstrations and examples highlighting UmetaFlow's capabilities.
UmetaFlow utilizes the pyOpenMS package, a Python wrapper for the OpenMS algorithms, allowing integration with other commonly used Python data science modules. This allows for interactive computing, easy data exploration and visualization, and rapid prototyping of new analytical steps.
Variant Annotation Apps Public Project published
Exciting news about an update to one of our public projects. We recently published the Variant Annotation Apps public project which is meant to replace the Variant Browser. This project is a starting point for users to annotate their VCF files and contains public files and apps to get started.
FusionCatcher 1.33 now available
FusionCatcher tool has been published on the CGC. FusionCatcher identifies somatic novel or known fusion genes, translocations, and chimeras within RNA-seq data.
exceRpt WF published
The exceRpt pipeline has been published on the CGC. The WF performs preprocessing and identification of smallRNAs, and results visualisation.
Cell Ranger tools published
Cell Ranger Aggr, Cell Ranger Reanalyze, Cell Ranger VDJ and Cell Ranger Count tools from Cell Ranger toolkit v8.0.1 are published on the CGC.
Cell Ranger is a set of analysis pipelines that perform sample demultiplexing, barcode processing, single cell 3' and 5' gene counting, V(D)J transcript sequence assembly and annotation, and Feature Barcode analysis from single cell data.
Severus and GeneFuse published
Severus 1.1 and GeneFuse 0.8.0 have been published on the CGC.
Severus 1.1 calls somatic structural variants in long reads data, while GeneFuse detects and visualizes known gene fusions from FASTQ files.
JuLI and FuSeq_WES apps published
JuLI (0.1.6.2) and FuSeq_WES (1.0.0, 4c55ebb) have been published on the CGC.
JuLI and FuSeq_WES are callers for gene fusions in targeted/WES data. JuLI controlpanel and FuSeq_WES Prepare References utility tools have also been published.
Release notes
Here are the latest tool updates in the Public Apps gallery on the CGC:
Regenie - Whole genome regression analysis tool.
Vcfanno - Variant annotator.
SHAPEIT5 - Phasing toolkit that includes: SHAPEIT5 phase_common, SHAPEIT5 phase_rare, SHAPEIT5 ligate.
PLINK2 - Population genetics toolkit.
UMI-tools - UMIs manipulation toolkit: UMI-tools Count-tab, UMI-tools Count, UMI-tools Group, UMI-tools Dedup, UMI-tools Extract, UMI-tools Whitelist.
BCFtools: BCFtools Annotate, BCFtools Call, BCFtools CNV, BCFtools Concat, BCFtools Consensus, BCFtools Convert, BCFtools CSQ, BCFtools Filter, BCFtools GTcheck, BCFtools Index, BCFtools Isec, BCFtools Mpileup, BCFtools Query, BCFtools Reheader, BCFtools Roh, BCFtools Sort, BCFtools Stats, BCFtools View, BCFtools Merge, BCFtools Norm.
Indexcov - Fast QC checks from BAI/CRAI files.
Variant Effect Predictor - Variant annotator.
AnnotSV - Annotation and ranking of SV variants.
Personal Cancer Genome Reporter - Functional annotation and classification of somatic variants.
Release notes
R API client library - sevenbridges2
We are excited to announce the release of our new R API client library - sevenbridges2, version 0.2.0. The name might not be groundbreaking, but our revamped R API client library certainly is! This initiative was driven by the need to modernise our R API offering, ensuring a more efficient and feature-rich experience for our users. Here are the key highlights and improvements:
Modernisation and Optimisation: The new sevenbridges2 R API client leverages R6 classes to enhance performance and maintainability. It features optimisations that reduce RAM usage and enhance runtime performance.
Expanded API Coverage: Initially focusing on essential functionalities such as Users, Projects, Files, Volumes, Billing, Apps, Tasks, and Bulk Actions, this release sets the foundation for further development. Future updates will include advanced features like async and enterprise-level actions to enrich our API capabilities.
User Experience: The new client introduces intuitive methods designed to enhance usability across various user scenarios, from local scripting to R Shiny app development, making interactions with our platform more efficient and scalable.
Enhanced Documentation: Unlike its predecessor, the sevenbridges2 R API client library includes comprehensive roxygen documentation for each method, providing clear and detailed explanations that enhance usability and streamline maintenance.
Now available for installation:
CRAN: Install the sevenbridges2 package using install.packages("sevenbridges2") (available starting 2024-07-01).
GitHub: Explore the latest updates and contribute on our GitHub repository: https://github.com/sbg/sevenbridges2
Release notes
The New File Browser is now live
We are thrilled to introduce the new version of the File Browser, designed to enhance user experience with several exciting improvements. Here's what's new:
508 Compliant Page - Committed to accessibility, our New File Browser now meets the 508 compliance standards, ensuring a more inclusive experience for all users.
Infinite Scroll - Page-by-page navigation is a thing of the past. With the introduction of infinite scroll, you can effortlessly browse through files and folders without any interruptions.
Retrieve File/Folder ID - The ID of any file or folder can now easily be obtained within the visual interface.
Enhanced Filtering Options - New categories for filtering files and folders are now available ("type", “downloadable”, and "status”) , making it easier for users to find exactly what they are looking for.
Shift+Click Selection - Selecting multiple files is now effortless. Use Shift+Click to quickly select a range of files and folders.
Recently published apps
Somatic small variant callers for long read data, ClairS (0.2.0) and ClairS-TO (0.1.0) (for matched tumor-normal pairs and tumor-only data, respectively) have been published to the CGC.
Release notes
Recently published apps
We have published the GCTA 1.94.1 tool on the CGC. GCTA is a suite of tools for various genetic analyses using genome-wide data. GCTA (Genome-wide Complex Trait Analysis) was initially developed to estimate the proportion of phenotypic variance explained by all genome-wide SNPs for a complex trait but has been greatly extended for many other analyses of data from genome-wide association studies (GWASs)
Release notes
Recently published apps
snM3C pipeline
The snM3C pipeline is designed for profiling 3D genome structure and DNA methylation in single cell data as a part of the Human Cell Atlas and the WARP BRAIN Initiative.
The snM3C pipeline performs:
Demultiplexing (by the Demultiplexing custom tool)
Reads sorting (by the Sort custom tool)
Reads trimming (by Cutadapt)
Paired-end reads alignment (by Hisat-3n)
Separating unmapped, uniquely aligned, and multi-aligned reads (by Separate unmapped reads wrapped around a custom script)
Splitting unmapped reads by enzyme cut site (by Split unmapped reads wrapped around a custom script)
Alignment of the unmapped, single-end reads (by Hisat-3n)
Removing the overlapping reads (by Remove overlap read parts wrapped around a custom script)
Merging mapped reads from single- and paired-end alignments (by Samtools Merge)
Removing duplicate reads (by Picard MarkDuplicates)
Calling chromatin contacts (by Call chromatin contacts wrapped around the custom script)
Creating ALLC files (by Allcools bam-to-allc)
Creating summary output (by Allcools extract-allc)
All tools are wrapped for the workflow specifically and use retagged us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1 Docker image.
DeepSomatic 1.6.1
DeepSomatic is an extension of DeepVariant for calling somatic variants from matched tumor-normal data. The tool is still in active development and only WGS data is currently supported.
SortMeRNA 4.3.6
SortMeRNA is a local sequence alignment tool for filtering, mapping and OTU clustering. The main applications of SortMeRNA are filtering rRNA from metatranscriptomic data, OTU-picking and taxonomy assignation available through QIIME v1.9+.
dupRadar 1.32.0
The dupRadar tool is intended for duplication rate quality control for RNA-Seq data. It gives an insight into the duplication problem by graphically relating the gene expression level and the duplication rate present on it.
Release notes
Recently published apps
Here are the new apps published in our Public Apps gallery:
ASCAT 3.1.2 tools (ASCAT prepareTargetedSeq, ASCAT prepareHTS and ASCAT). ASCAT prepareTargetedSeq prepares SNP references for ASCAT processing of targeted sequencing data. ASCAT prepareHTS prepares sequencing data (WGS, WES or targeted) for ASCAT. ASCAT infers tumor ploidy, purity and allele-specific copy number profiles.
JAFFAL 2.3 tool. JAFFAL is used to detect fusion genes from long-read (PacBio and ONT) transcriptome sequencing with high accuracy, overcoming the challenges posed by higher error rates in long-read data.
Ballgown 2.34.0 toolkit. Ballgown is a package designed to facilitate flexible differential expression analysis of RNA-Seq data. It also provides functions to organize, visualize, and analyze the expression measurements for transcriptome assembly
Apps with version updates
StringTie 2.2.1 toolkit. StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. StringTie Merge tool merges/assembles GTF/GFF transcript files into a non-redundant set of transcripts. This tool should be used after StringTie transcript assembling of each sample in the experiment.
Release notes
Recently published apps
We published MSIsensor-pro v0.1.2 in our Public Apps gallery:
MSIsensor-pro v0.1.2 toolkit (an updated version of MSIsensor):
MSIsensor-pro scan - a tool for cataloging homopolymers and miscrosatelites sites in the reference genome. It prepares reference for MSIsensor-pro msi.
MSIsensor-pro msi - a tool for somatic microsatellite changes detecting and scoring. Designed to work with paired tumor-normal data.
MSIsensor-pro baseline - a preprocessing tool for microsatellite instability (MSI) detection using only tumor sequencing data.
MSIsensor-pro pro - a tool for evaluating microsatellite instability (MSI) using tumor only data
We also published the following ASCAT 3.1.2 tools:
ASCAT prepareTargetedSeq prepares SNP references for ASCAT processing of targeted sequencing data.
ASCAT prepareHTS prepares sequencing data (WGS, WES or targeted) for ASCAT. ASCAT infers tumor ploidy, purity and allele-specific copy number profiles.
Recently updated apps
We updated the following apps from the MSIsensor v0.6 toolkit:
MSIsensor scan - a tool for cataloging homopolymers and miscrosatelites sites in the reference genome. It prepares reference for MSIsensor msi.
MSIsensor msi - a tool for somatic microsatellite changes detecting and scoring. Designed to work with paired tumor-normal data.
Release notes
Recently published apps
We’ve published the following new apps on the CGC:
FusionInspector (v2.8.0), a tool that performs validation of fusion transcript predictions. FusionInspector is a part of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). It takes a list of potential fusion genes (obtained by executing any fusion transcript prediction tool), extracts the genomic regions corresponding to the fusion partners, and creates mini-fusion-contigs that hold the gene pairs in the suggested fused orientation. The original reads align to these putative fusion contigs. In the fusion-gene context, fusion-supporting reads that would typically align as split reads or discordant pairs should align as concordant ‘normal’ reads. Reads that span fragments and reads containing fusion breakpoints that support each fusion, are recognized, reported, and scored accordingly.
Arriba (v2.4.0), a tool for the detection of gene fusions from RNA-Seq data. Arriba is designed to work with STAR aligner-processed data, and the post-alignment runtime is typically a few minutes long. Arriba does not require reducing the --alignIntronMax parameter of STAR to identify fusions resulting from focal deletions, in contrast to many other fusion detection methods that are based on STAR. Its intended application was in the context of clinical research. As such, high sensitivity and fast runtimes were crucial design requirements. Arriba can identify structural rearrangements other than gene fusions that may have clinical significance. These include viral integration sites, internal tandem duplications, whole exon duplications, and truncations of genes (i.e., breakpoints in introns and intergenic regions).
Recently updated apps
We also updated the following apps:
DESeq2 tool (v1.40.1) that performs differential gene expression analysis across two or more study conditions. DESeq2 performs differential gene expression analysis using negative binomial generalized linear models. It analyzes estimated read counts from several samples, each belonging to one of two or more conditions under study, searching for systematic changes between conditions, as compared to within-condition variability.
Release notes
Cloud Cost Estimator now available on the CGC
We have added the capability to predict and understand the cost of analyses before running them, for a set of selected apps available in the Public Apps gallery. The estimation is based on the following parameters:
Use of spot instances. Prices of using spot and on-demand instances differ and affect the final task price.
Total input file size. The size of input files affects task running time, which impacts the total task cost.
Type of instance that is used to run the task. The cost of running different types of instances depends on their type and resources (available compute power, memory, etc.). Note that estimations are available only for instances with default resource configuration, as defined by cloud providers, and won't be available if the default resource values are changed. See available Amazon Web Services and Google Cloud Platform instances on the CGC.
Please note that the estimated costs are AN ESTIMATE ONLY AND NOT A COST GUARANTEE. The costs shown in the estimates are only an approximation. Final costs may change after all the task elements have been accounted for.
New public project on the CGC
To further increase the versatility and usability of available analyses, we also published the Vitessce Demo Notebook as a part of the Integrative Single-cell Data Visualization with Vitessce: User Guide public project on the CGC. This project serves as a comprehensive tutorial for users interested in leveraging Vitessce for the visualization and analysis of single-cell data. It features one Data Studio interactive analysis, written in Python, with step-by-step demonstrations and examples showcasing the integrative capabilities of Vitessce Python API.
Recently published apps
We have also published new workflows for processing Nanopore data:
ONT Flowcell Processing - aligns (Minimap2), sorts (Samtools) and quality checks (NanoPlot, Samtools Flagstat, Mosdepth, GATK ComputeLongReadMetrics) input Nanopore data from a single flowcell.
ONT WGS Variant Calling - merges (Sambamba), calls variants (Clair3, Sniffles2) and quality checks (Mosdepth, NanoPlot) input BAM files from Nanopore data.
Release notes
Recently updated apps
We have updated the following tools on the CGC:
Exomiser 13.3.0 – used to identify candidate causative variants from WES or WGS patient VCF data and phenotype HPO terms.
PharmCAT 2.8.3 toolkit:
PharmCAT VCF Preprocess – prepares an input VCF file for PharmCAT.
PharmCAT – takes a single-sample VCF file and returns a report with guideline variants.
Sambamba 1.0.1 toolkit:
Sambamba Index – creates a BAI or FAI index for the provided BAM/FASTA file.
Sambamba Slice – copies a slice (region) of the coordinate sorted and indexed input file in BAM or FASTA format.
Sambamba Sort – sorts alignments in BAM format.
Sambamba Markdup – marks or removes duplicate reads from an input BAM file.
Sambamba Flagstat – creates read flag statistics from a BAM file.
Sambamba Merge – merges alignments in BAM format.
Sambamba View – inspects and filters alignments in SAM/BAM format.
Clair3 1.0.4 – calls small germline variants from data generated by Nanopore, PacBio or Illumina sequencing technologies.
Release notes
Azure now available on the CGC
To help you reduce costs and run analyses more effectively and efficiently in compute locations that are closer to your data, we have introduced a new Azure region on Cancer Genomics Cloud (CGC). The Azure South Central US region can now be selected as the project location when creating a new project on the CGC, meaning that data in such projects will be stored and processed using Azure cloud capacities. On top of that, we have also added support for attaching Azure storage buckets as volumes to these two platforms.
Single and Global logout flows defined by SAML protocol are now available for SSO
Users who access the CGC through Single Sign-On (SSO) can now perform Singe (IdP Initiated) logout to log out of multiple SSO sessions, in a single click. Also, it is now possible to initiate Global (SP initiated) logout flow from the CGC.
Recently published apps
We have published the following tools in our Public Apps gallery:
Tximport, a tool that imports and summarizes transcript-level estimates for transcript and gene-level analysis based on the tximport R/Bioconductor package. It is designed to simplify the import of transcript-level abundances, estimated counts, and effective lengths from a variety of upstream tools, for downstream transcript-level or gene-level analysis.
Three tools from the SplAdder (3.0.4) toolkit:
Five tools from the Qualimap 2.3 toolkit:
Six tools from the RSeQC 5.0.1 toolkit:
The Tidyproteomics 1.5.2 toolkit
Recently updated apps
TopHat2, a tool that aligns RNA-Seq reads to a genome to identify exon-exon splice junctions, just got updated to version 2.2.1 and upgraded to CWL version 1.2 (was previously available in CWL draft-2).
Release notes
Improved error messages for volume imports
To provide you with more detailed information about each import from an attached volume and enable you to resolve import issues independently, we have added improved notifications in the recently implemented Activity center, available by clicking Open activity center in the Activity feed. When any of the items from a particular import fails, you will be able to see an error message and a corresponding error code for each of the items, allowing you to understand and try to fix the issue. Furthermore, a description and link to the relevant documentation will be provided for each import from a volume.
Recently published apps
The Change-O 1.3.0 toolkit is the latest new toolkit addition in our Public Apps gallery. It includes the following apps:
DefineClones - assigns Ig sequences into clonal groups.
BuildTrees - creates IgPhyML input files.
ParseDb - parses and updates input database files.
AlignRecords - multiple aligns sequence fields.
AssignGenes - assigns V(D)J gene annotations.
MakeDb - creates standardized databases output from the input germline alignment results.
CreateGermlines - reconstructs germline V(D)J sequences for alignment data.
ConvertDb - parses input tab-delimited database files and converts them to different output formats.
Recently updated apps
We updated Broad Institute’s best practices for somatic copy number variant discovery analyses, to version 4.2.5.0 in our Public Apps gallery:
GATK Somatic CNV Panel Workflow 4.2.5.0 - used for creating a panel of normals (PON) given a set of normal samples.
GATK Somatic CNV Pair Workflow 4.2.5.0 - used for detecting copy number variants (CNVs) from WES/WGS single sample data in tumor-only or matched-normal mode.
Release notes
Recently published apps
The pRESTO 0.7.1. toolkit is the latest new toolkit addition in our Public Apps gallery. It includes the following apps:
ParseLog - Parses pRESTO log records and outputs values in TAB-separated tables.
BuildConsensus - Builds consensus sequences.
ClusterSets - Clusters sequences into groups.
CollapseSeq- Removes duplicates sequences from input FASTA/FASTQ files.
PairSeq - Sorts and matches sequences across input files.
ConvertHeaders - Converts sequence headers to pRESTO format.
AlignSets - Aligns sequences using different methods.
FilterSeq - Filters input sequences.
ParseHeaders - Manipulates sequence headers.
SplitSeq - Splits and samples sequence files.
UnifyHeaders - Reassigns or deletes sequence header fields.
AssemblePairs - Assembles paired-end reads to a single sequence.
MaskPrimers - Removes primers and annotates sequences with primers and barcodes.
EstimateError - Estimates annotation set error rates.
We also published the following new tools:
ComBat-seq (sva 3.35.2), an R tool used for batch effect adjustment in bulk RNA-seq data. Some additional improvements to the tool wrapper were developed, like removing more than one batch by dataset and adapting outputs to be compatible with downstream analyses (DeSeq).
GffRead (0.12.7) GFF/GTF utility tool providing format conversions, filtering, FASTA sequence extraction, and more.
Recently updated apps
We published the following updates in our Public Apps gallery:
RNA-seq alignment - STAR (2.7.10a), a workflow that performs the first step of RNA-seq analysis - alignment of the reads to a reference genome. It is used to generate aligned BAM files (in genome and transcriptome coordinates) from RNA-seq data, which can later be used in further RNA studies, like gene expression analysis.
Trim Galore! (0.6.10) is a wrapper around adapter trimming and quality control tools Cutadapt and FastQC with extra functionality for RRBS data.
Release notes
Recently published apps
We published Immcantation toolkit 4.4.0 in our Public Apps gallery. The toolkit consists of a set of pipeline scripts which are wrapped as the following tools:
preprocess-phix - removes reads which align to phiX174 from the input sequence file.
presto-abseq - runs pRESTO tools for pre-processing of NEBNext / ABSeq immune sequencing data.
presto-clontech - uses pRESTO tools for analyzing Takara Bio/Clontech SMARTer v1 immune sequencing kit data.
presto-clontech-umi - uses pRESTO tools for analyzing Takara Bio/Clontech SMARTer v2 (UMI) immune sequencing kit data.
changeo-10x - annotates and infers clonal relationships in Cell Ranger 10x Genomics single-cell V(D)J data.
changeo-igblast - does V(D)J alignment using IgBLAST.
tigger-genotype - does TIgGER polymorphism detection and genotyping.
shazam-threshold - calculates clonal assignment threshold.
changeo-clone - runs Change-O cloning and germline reconstruction.
We also published Nirvana 3.18.1. Nirvana annotates variants from VCF file input and generates a JSON file with results.
Release notes
Recently published apps
We published the following apps in our Public Apps gallery:
RNA-SeQC 2.4.2, a tool that computes post-alignment quality control metrics for RNA-Seq data. It takes aligned reads in BAM/SAM or CRAM format and an annotation file as inputs, and outputs different alignment metrics files
scCODA 0.1.9, a Python-based tool that performs differential analysis of cell populations.
Release notes
Recently published apps
We have just published the following tools from the BBTools 39.01 toolkit:
BBDuk: used for trimming, filtering, and masking of input reads.
Reformat: used for generic read-processing tasks (changing ASCII quality encoding, interleaving, file format, compression).
BBMap: used for splice-aware read alignment.
Dedupe: used for removing duplicates from input sequences.
SplitNextera: used for splitting Nextera long-mate-pair reads.
CalcUniqueness: used for determining library complexity and the need for additional sequencing by generating kmer uniqueness histogram.
Taxonomy: used for printing taxonomy information for provided organism identifiers.
Repair: used to correct disordered reads and reads whose mates have been lost.
Seal: used for alignment-free sequence quantification.
BBMerge: used for merging overlapping paired end reads.
BBMask: used for masking low-complexity, tandem repeats or SAM mapped regions.
Tadpole: used as a kmer-based assembler.
Statistics: used for calculating assembly statistics.
BBNorm: used for normalizing read depth based on kmer counts.
Release notes
DRS notification improvements and the brand new Activity center
To provide you with more detailed information about each DRS import operation and enable you to resolve import issues independently, we have improved DRS-related notifications and implemented the Activity center, available by clicking Open activity center in the Activity feed.
Galaxy and OHIF Viewer now available in Data Studio on the CGC
Cancer Genomics Cloud just became more versatile by offering two new interactive tools as Data Studio environments, Galaxy and OHIF Viewer.
Galaxy is an open-source platform for FAIR data analysis that enables you to use tools from various domains and plug them into workflows through its graphical web interface.
The OHIF Viewer is a medical image viewer provided by the Open Health Imaging Foundation (OHIF). It is a web application designed to load large radiology studies as quickly as possible.
Release notes
Recently published apps
We published Giraffe-DeepVariant workflow 1.0, Cramino 0.9.7 and kyber 0.4.0 tools from the NanoPack2 toolkit, as well as Pisces 5.3.0.0 tool, PureCN NormalDB workflow 2.6.4, PureCN workflow 2.6.4, zUMIs 2.9.7 tool, and AlphaFold 2.3.2 tool. Read more for details.
Release notes
Recently published apps
We published the following apps in our Public Apps gallery:
RADx-rad v0.2 Workflow, which is used for metagenomic data analysis of SARS-CoV-2 from wastewater samples. The workflow was developed and ported to CWL as a part of the RADx (Rapid Acceleration of Diagnostics) - the initiative to speed innovation in the development, commercialization, and implementation of technologies for COVID-19 testing, launched by The US National Institutes of Health (NIH).
CNVPanelizer 1.32.0, which generates a report table and visualization of detected CNVs from targeted sequencing data.
Control-FREEC 11.6, which can be used for somatic copy number analysis of WGS, WES and targeted data.