Divya Sain Divya Sain

Release notes

Recently published apps

We have published the following apps in our Public Apps gallery:

  • VEP Slivar Trios Rare Diseases Analysis with VEP 109.3 version and Slivar 0.3.0 version inside. This analysis is used for preprocessing and analyzing variants from related individuals (trios or families; WES or WGS).

  • STAR-Fusion (v1.12.0), an app that uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads.

  • STAR-Fusion Build FusionFilter Dataset (v1.12.0) that creates the required CTAT genome lib archive for STAR-Fusion execution.

  • Cutadapt (v4.4), an app most commonly used for removing adapter sequences. It finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequences from high-throughput sequencing reads.

  • Seven tools from the Kalisto 0.48.0 toolkit:

    • kallisto quant computes equivalence classes for reads and quantifies transcript abundances from RNA-Seq data.

    • kallisto quant-tcc runs the EM algorithm on a supplied TCC matrix file to make transcript-level estimates.

    • kallisto bus produces BUS (Barcode-UMI-Set format) output files from single-cell RNA Seq datasets.

    • kallisto merge merges the results of several batches obtained by kallisto pseudo.

    • kallisto h5dump converts HDF-5-formatted results to plaintext.

    • kallisto index builds an index from a transcriptome FASTA formatted file of target sequences.

    • kallisto inspect outputs the target de Bruijn Graph from the kallisto index file in different file formats.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We published the following apps in our Public Apps gallery:

  • Parabricks fq2bam (4.0.0-1) - GPU-accelerated alignment, duplicate marking and optionally BQSR.

  • Parabricks haplotypecaller (4.0.0-1) - GPU-accelerated GATK HaplotypeCaller.

  • Parabricks deepvariant (4.0.0-1) - GPU-accelerated version of DeepVariant.

  • Parabricks Somatic Calling workflow - calling somatic variants from a matched tumor-normal sample pair. It is based on running accelerated Mutect2 on GPU instances with or without a panel of normals.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We published the NanoMod workflow version 1.1. NanoMod is a workflow for detecting RNA modifications using Oxford Nanopore direct long-read sequencing data.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We published the following tools from the GATK 4.4.0.0 and ensembl-vep 109.3 toolkits:

  • GATK BaseRecalibrator, which generates a recalibration table based on various covariates for input mapped reads.

  • GATK ApplyBQSR, which recalibrates the base quality scores of an input BAM or CRAM file containing reads.

  • GATK GatherBQSRReports, which gathers scattered BQSR recalibration reports into a single file.

  • GATK HaplotypeCaller, which calls germline SNPs and indels from input BAM file(s) via local re-assembly of haplotypes.

  • GATK VariantFiltration, which is used for filtering variants in a VCF file based on INFO and/or FORMAT annotations.

  • Augmented Filter VEP, which is a customized wrapper of the filter_vep script from the ensembl-vep toolkit. The tool is modified to allow GNU parallel-scattered filtering of VEP-annotated VCFs split on chromosomes.

  • Variant Effect Predictor, which predicts functional effects of genomic variants and is used to annotate VCF files.

In addition, the VEP annotation workflow 109.3 is also live and available in the Public Apps gallery. It is used for preprocessing, annotating, and filtering VCF files using the vt toolkit and VEP.

We also published the PURPLE CNV Calling Workflow used for somatic CNV calling and purity and ploidy estimation on WGS data. It is based on PURPLE 3.7.2, and consists of two additional tools - AMBER and COBALT. The workflow first calculates B-allele frequency (BAF) with AMBER and read depth ratios with COBALT, which is then used by PURPLE to estimate the purity, ploidy and copy number profile of a tumor sample.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We published the following tools from the STAARpipeline 0.9.6 and FAVORannotator 1.0.0 toolkits:

  • STAARpipeline tool, which performs phenotype-genotype association analyses using the STAAR procedure. The app is designed for analyzing whole-genome/whole-exome sequencing data.

  • STAARpipelineSummary VarSet tool, which summarizes results from the STAAR procedure for analyzing WGS and WES data.

  • STAARpipelineSummary IndVar tool, which extracts information of individual variants from a user-specified variant set.

  • FAVORannotator tool which functionally annotates genotype data in GDS format using the FAVOR Database. The resulting file can then facilitate a wide range of functionally-informed downstream analyses, for example, phenotype-genotype association analyses using the STAARpipeline toolkit.

Read More
Divya Sain Divya Sain

Release notes

New Public Projects view

The new Public Projects gallery view is now available on the CGC. The new interface now resembles our Public Apps gallery and provides an overview of the purpose and content of each project from a single page, which should make the projects more accessible and allow you to have a better insight into their usefulness for your specific use cases.

Recently published apps

We have published the following new and updated apps in our Public Apps gallery:

  • ABySS 2.3.5 - a de novo sequence assembler intended for short paired-end reads and genomes of all sizes.

  • Minia 3.2.6 - a short-read assembler based on a de Bruijn graph.

  • IDBA 1.1.3 toolkit:

    • IDBA-Hybrid - a de novo assembler for hybrid sequencing data.

    • IDBA-UD - a short-read-data de novo assembler.

    • fq2fa - used for converting FASTQ format read data to FASTA format suitable for IDBA tools.

  • ABACAS 1.3.1 - used for contiguating reference-based assemblies.

  • Viralrecon Illumina De novo assembly workflow - designed for amplicon and metagenomics short-reads assembly. It is able to analyze metagenomics data obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based or probe-capture-based data). It takes single or multiple sample Illumina short-reads, and performs reads trimming, removing host reads, assembly with one of the five included assemblers, blasting and different QC metrics calculating.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published the following GATK 4.4.0.0 tools:

  • GATK IndexFeatureFile used for indexing of provided feature files.

  • GATK MergeVcfs - used for combining multiple variant files.

  • GATK VariantEval BETA - used for evaluating variant calls.

  • GATK FilterMutectCalls - used to filter somatic SNVs and indels called by Mutect2.

We have also published Minimac 4 4.1.2, which is a tool for imputing genotypes.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

Metagenomics WGS analysis - Centrifuge 1.0.4

A workflow for analyzing metagenomic samples. It assigns taxonomic labels to DNA sequences, estimates the abundance of the taxonomic categories in the sample, makes visualizations that give insights into the taxonomic structure of the sample, and makes files that are suitable for downstream analysis. This allows researchers to assign reads from their samples to a likely species of origin and quantify each species’ abundance.

Reference Index Creation - Centrifuge 1.0.4

A workflow that builds an index from reference sequences downloaded from NCBI databases.

Five tools from the Centrifuge 1.0.4 toolkit:

  • Centrifuge Classifier is the main tool of the Centrifuge toolkit, used for classification of metagenomics reads.

  • Centrifuge Download is a part of the Centrifuge toolkit, used for downloading reference sequences from NCBI.

  • Centrifuge Build is a part of the Centrifuge toolkit, which makes a Centrifuge index from DNA sequences.

  • Centrifuge Kreport is used to make a Kraken-style report from the Centrifuge Classifier output.

  • Centrifuge Inspect is a part of the Centrifuge toolkit that inspects index files.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published HTSeq-count (2.0.2 in CWL 1.2). HTSeq-count is a Python tool for counting how many reads map to each feature. It takes aligned reads together with a list of genomic features as inputs, and outputs a TSV table with counts for each genomic feature.

Read More
Divya Sain Divya Sain

Release notes

We have just published five tools from the GraphicsMagick 1.3.38 toolkit, the swiss army knife of image processing:

  • GraphicsMagick compare compares two images using statistics and/or visual differencing. The tool compares two images and reports difference statistics according to specified metrics, and/or outputs an image with a visual representation of the differences.

  • GraphicsMagick composite composites (combines) images to create a new image.

  • GraphicsMagick conjure interprets and executes scripts in the Magick Scripting Language (MSL). The Magick scripting language (MSL) will primarily benefit those that want to accomplish custom image processing tasks but do not wish to program.

  • GraphicsMagick convert is used to convert an input image file using one image format to an output file with the same or different image format while applying an arbitrary number of image transformations.

  • GraphicsMagick montage creates a composite image by combining several separate images.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published two Bowtie2 2.4.5 (CWL 1.2) tools:

  • Bowtie2 Indexer, for building a Bowtie index from a set of DNA sequences.

  • Bowtie2 Aligner, for performing end-to-end read alignment.

On top of that, there are two more additions to our Public Apps gallery:

  • RSeQC - Junction Saturation 5.0.1 (CWL 1.2) tool for determining if the sequencing depth is sufficient to perform alternative splicing.

  • GATK IndexFeature 4.2.5.0 tool.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published the following three tools:

  • SPAdes 3.15.5 - an assembly tool containing various assembly pipelines. SPAdes can be used for reads produced by different sequencing technologies, such as: Illumina, IonTorrent, PacBio, Oxford Nanopore and Sanger. SPAdes was tested on small genomes (eg. bacterial, fungal) and is not intended for larger ones.

  • Unicycler 0.5.0 - a tool for bacterial genome assembly. It can assemble Illumina-sequenced reads, as well as PacBio or Nanopore long-read-only sets (for the best assemblies, it can conduct a hybrid assembly by taking both Illumina and long reads).

  • Quast 5.2.0 - a tool for genome assembly evaluation. QUAST implements different methods for analyzing assemblies. By default, it utilizes Minimap2 for alignment. GeneMarkS, GeneMark-ES, Glimmer, Barrnap and BUSCO are used for gene prediction, while finding structural variations is done by BWA, Sambamba, and GRIDSS. Additionally, QUAST uses bedtools for calculating read coverage, which is presented in the Icarus contig alignment viewer.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published six tools from the BEDTools 2.30.0 toolkit:

  • BEDTools Coverage - returns the depth and breadth of coverage of features from B on the intervals in A.

  • BEDTools Genomecov - computes histograms of feature coverage for a given genome.

  • BEDTools GetFasta - extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file.

  • BEDTools Intersect - screens for overlaps between two sets of genomic features.

  • BEDTools Merge - combines overlapping or “book-ended” features in an interval file into a single feature.

  • BEDTools Sort - sorts a feature file by chromosome and other criteria.

We have also published the Functional Equivalence Evaluation workflow for comparing the functional equivalence of different WGS/WES processing analyses. Functional Equivalence Evaluation workflow is used to establish if the results can be used together (compared, merged) in downstream analyses (common scenario with large, multi-center sequencing studies where different institutions use their own analysis protocols) or considered equally valid for drawing conclusions.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published the Nextstrain 2.4.0 toolkit:

  • Nextclade dataset list tool which is used for listing available Nextclade datasets.

  • Nextclade dataset get tool which is used for downloading Nextclade datasets.

  • Nextclade run tool which is used for alignment, mutation calling, clade assignment, quality checks and phylogenetic placement of viral sequences.

  • Nextalign run tool which is used for viral genome alignment and translation.

Read More
Divya Sain Divya Sain

Release notes

DRS export now available on the CGC

In order to further improve interoperability and allow our users to move their data in a seamless way across platforms, we have introduced the DRS export option on the CGC. With the new functionality, users can generate links to platform files (DRS URIs) and metadata into a manifest file, which can then be used for importing the files and metadata on other platforms. Learn how to generate a DRS manifest file on the CGC.

Recently published apps

We have published the Bracken 2.7 toolkit:

  • Bracken (Bayesian Reestimation of Abundance with KrakEN) tool is used for abundance estimation at the species level, the genus level, or above.

  • Bracken Build is used to prepare the reference database for Bracken.

In addition, Metagenomics Profiling - Kraken2 workflow has been published on the CGC. It is used for metagenomic classification, abundance estimation, and visualization.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have recently published the following apps:

  • SBG Single-Cell RNA Deep Learning - Training, a single-cell classifier pipeline for human data. It relies on the transfer learning approach, which uses pre-trained gene embeddings as the starting point for building a model adjusted to given single-cell datasets.

  • SBG Single-Cell RNA Deep Learning - Predict, a single-cell classifier pipeline for human data. This app uses the deep learning model generated by the SBG Single-Cell RNA Deep Learning - Training workflow to classify the input dataset.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published the CNVkit 0.9.9 toolkit for inferring and visualizing copy number from high-throughput DNA sequencing data. The toolkit includes the following tools:

  • CNVkit breaks lists the targeted genes in which a segmentation breakpoint occurs.

  • CNVkit access calculates the sequence-accessible coordinates in chromosomes from the given reference genome.

  • CNVkit diagram draws copy number or segments on chromosomes as an ideogram.

  • CNVkit export bed converts segments to a BED file.

  • CNVkit export vcf converts segments to a VCF file.

  • CNVkit segmetrics calculates summary statistics of individual segments.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published the following apps:

  • SBG Pair FASTQs by Metadata CWL1.2 tool, which accepts a list of FASTQ files and groups them into sub-lists based on the metadata. The sbg:draft-2 version of this tool will also remain available in the Public Apps gallery.

  • Upgraded version of the MultiQC (v1.13, CWL1.2) tool, which aggregates results from bioinformatics analyses across many samples into a single report. This wrapper version of MultiQC can also accept inputs from files that were produced by the Salmon Workflow (salmon_quant_archive.tar).

Read More
Divya Sain Divya Sain

Release notes

Prepare data for submission to dbGaP in a breeze

dbGaP Submission Form Suite RShiny app streamlines the way a CGC user can submit their data to dbGaP. The app allows you to import, map and prepare your data for import into the database of Genotypes and Phenotypes (dbGaP). You can easily export the metadata from a project into this tool and transform it in the visual interface to match the dbGaP submission guidelines. Once completed, all you have to do is download the produced Excel file and submit it to dbGaP.

The app is available under Interactive Browsers > Custom interactive apps on the CGC. Learn how to use it from the documentation.

Read More
Divya Sain Divya Sain

Release notes

Interactive Web App Gallery is now live

The Interactive Web Apps page is now available on the CGC. The new page contains all R Shiny apps that we publish and makes them more prominent and accessible in the CGC interface.

Interactive Web Apps are available under Public Apps > Interactive Web Apps on the top navigation bar. With this update, the Public Apps menu item on the CGC has changed from a tab to a dropdown menu which now contains the Workflows and Tools page, where the previous Public Apps page content is located.

OmicCircos plot generation app now available on the CGC

OmicCircos is now available as a custom interactive app on the CGC. The OmicCircos app is an R Shiny application created around the OmicCircos R package for more effective generation of high-quality circular plots for visualizing omics data. Its integration with the Cancer Genomics Cloud (CGC) makes it easy to launch the app from inside the CGC and visualize data that is already present in any of your CGC projects.

The OmicCircos R package that the interactive CGC app is based on was developed by Ying Hu, Chunhua Yan and Xiapeng Bian as a part of Daoud Meerzaman's Computational Genomics and Bioinformatics group at CBIIT/NCI, and it can also be installed via Bioconductor. The data can be gene or chromosome position-based values from mutation, copy number, expression, and methylation analyses.

Find out more about using OmicCircos on the CGC and the OmicCircos R package.

Read More