12 Questions for our Colleagues at Georgetown University

Conducted on July 7, 2021 | Compiled & Edited by Dr. Dan Ventre

Speakers

Dr. Yuriy Gusev: Associate Professor of Bioinformatics at Georgetown University Innovation Center for Biomedical Informatics, and also Director of the Master's Degree Program in Health Informatics & Data Science (HIDS) in Georgetown.

Ms. Krithika Bhuvaneshwar: Research Faculty Instructor for the Georgetown University HIDS program. Also a Senior Bioinformatician at the Innovation Center for Biomedical Informatics at Georgetown University.

Dr. Yuriy Gusev

Ms. Krithika Bhuvaneshwar

Interviewers

Dr. Manisha Ray: Senior Program Manager at Seven Bridges, primarily working on the CGC. Has worked closely with Yuriy and Krithika (below) on several courses for teaching bioinformatics using the CGC.

Ms. Sai Lakshmi Subramanian: Program Manager at Seven Bridges, co-managing the CGC. Handles some of the CGC Outreach activities, including the CGC Monthly Webinar series, which is running successfully from January 2021.

Dr. Manisha Ray

Ms. Sai Lakshmi Subramanian

Question 1:

Can you give us a brief background of the Health Informatics & Data Science Program?

Yuriy Gusev:

The Health Informatics & Data Science Program is an accelerated Master's Degree Program at Georgetown University. We started it two years ago, in 2019, and so far it's been a successful program, attracting students with mostly computational backgrounds from around the globe. We focus on teaching students, and providing them with hands-on experience in all our courses. One of our choices to provide this hands-on experience was to use the Seven Bridges platform to teach students how to handle big biomedical data on a cloud, and how to use different kinds of bioinformatic platforms to analyze the data.

Krithika Bhuvaneshwar:

For teaching, we've been working with the Seven Bridges platform since 2019. We initially started working on the RNA-seq and whole-genome sequencing. We teach the students how to do an end-to-end workflow for both RNA and whole-genome sequencing during the class, and they also get assigned homework for that. It's been a great experience for them, because they are able to learn the concepts of the workflow itself, and so it's much easier for them to learn by using the Seven Bridges platform.

Question 2:

What aspects of the platform, in particular, make it suited for teaching your students?

Yuriy Gusev:

There were really several reasons why we chose Seven Bridges platform. When RNA-seq and whole-genome sequencing became very popular research applications, especially in cancer research, we really were looking for cloud-based solutions for our research program. Later on, we realized that the same approach with cloud-based platform for very large amount of genomic information could be used for training, and after considering different available commercial and non-profit platforms, we decided to choose the Seven Bridges platform because we found it to be the most user-friendly, convenient in usage, and also provides a great opportunity to do reproducible and shareable research online.

Krithika Bhuvaneshwar:

When you're teaching concepts of sequencing to students who don't know much about genomics, it's much easier to start from the basics. And, a simple end-to-end workflow like in the Seven Bridges platform, is much easier to teach, as opposed to starting with a virtual machine or shell script or any of those installations that we have to teach them first.

Question 3:

Can you help us understand which elements of the CGC platform do you find the most helpful when you structure your course curriculum, and why those elements are appropriate for the teaching course?

Yuriy Gusev:

There are very serious challenges which we face when trying to teach modern genomic and bioinformatics techniques to graduate students. One of the major challenges is to provide students with a real-life experience of how to handle massive amounts of genomic data, so-called big biomedical data. We spent quite a few years developing methodologies which allow us to make it easier for students to understand and get a feel, so to speak, of how to work with this massive amount of data.

For example, the TCGA collection is about 2.5 petabytes of data. Obviously, it's not possible to do on local machines, local computers or local workstations. The cloud solution is the only solution we can use to provide access to this data. The TCGA collection, in particular, is not only a great research tool, but we also found it very useful to use for training, and Seven Bridges provides direct access to the TCGA collection as well as many other very large, massive amounts of data in publicly available collections, such as TCIA, the image data collection.

[What] we also found very useful is the collection of tools and pipelines for analyzing the data, which are available through the Seven Bridges platform. Those tools are available at the cloud platform right there, where their collection of genomic data is located, and that allows us to follow the major principle of bringing tools to the data, as opposed to bringing data to the tools, which is very impractical when you deal with genomic data. So, basically, the Seven Bridges [platform] provides us the opportunity to teach students how to handle big biomedical data, and how to use tools to analyze them on a cloud platform.

Krithika Bhuvaneshwar:

I think the Seven Bridges has a nice framework for teaching, because the way we set up the infrastructure is: we have one project which we create for each of our students, and then we copy the pipeline and we teach them how to copy the data as well. So, in addition to the instructor demo-ing the pipeline, the student is also able to run the same pipeline at the same time, or at a class in their own project. And we, as instructors, are able to track it because we are also collaborators listed on the project. So, it's very easy to not only teach, but also to check if they're running the homework correctly for the hands-on portion of the class.

Question 4:

How have the students responded to being taught bioinformatics using the CGC? What do they say about using the platform?

Yuriy Gusev:

The pipelines we are using in our research, and also using for teaching, are quite complex and sophisticated. It's a multi-step workflow which includes multiple software packages and in general it's a tough topic to teach because it's very complex, and requires students to have an advanced knowledge of genomics and computational approaches for genomics as well. So, to make it easier for students, we decided to use Seven Bridges’ collection of tools, because they are presented in a very user-friendly, graphical interface where students have this low barrier to understand and start using [the pipelines] which otherwise would require a lot of additional training. In this way we found, in our experience, it makes it easier for us to train students, but also from the students’ perspective, it's much easier for them to start using.

Krithika Bhuvaneshwar:

Yeah, I completely agree. Before we do the Seven Bridges hands-on session in the classroom, we go over the concepts of what each of the steps in the pipeline are, and that's actually a very complex and dense topic, and we can see on their faces that it's quite difficult to understand. Then we come to the hands-on session and then they're able to see the pipeline, the workflow, in the Seven Bridges platform, and they're able to run them quickly and it's much easier for them to digest once they see it on the SB platform.

Question 5:

Can you elaborate a bit more on the different kinds of analyses that you have taught?

Yuriy Gusev:

Right now we are using Seven Bridges platform in two different courses. One is Precision Health Informatics. This is a course we developed with Krithika together, running during fall semester every year. And, for this course, we utilize Seven Bridges to teach students how to process genomic sequencing data, specifically sequencing pipelines and whole-genome sequencing pipelines, as well as exome sequencing pipelines. So, RNA sequencing, probably our major focus currently, [is] where we spend more time with students because we also have additional modules for downstream analysis of data generated on the Seven Bridges platform. So, the students can not only learn how to process the data, but also how to do additional research after you process the genomic data.

For imaging, we also started to use the Seven Bridges platform because modern biomedical imaging analysis is now moving toward a quantitative type of analysis, which is similar to what is happening in genomics. Basically, we're talking about a feature extraction workflow, which is implemented on Seven Bridges platform and which allows us to provide continuity of teaching. We teach, first, how to do multi-omics analysis in our course on genomics and then, for the imaging informatic course, we teach how to extract features and apply quantitative analysis, very much similar to multi-omics analysis, to image analysis data type.

Question 6:

And, are you planning to expand into additional course types in the future?

Yuriy Gusev:

Of course. This platform is universally applicable to any kind of big biomedical data. At this point, we don't really have additional courses where we could implement it, however, we are creative people. We constantly are looking for new opportunities to provide our students with the best experience and training, so I wouldn't exclude a possibility that in the future we will use it for more courses.

Krithika Bhuvaneshwar:

Yeah, I think Dave's [Dave Roberson, Community Engagement Manager] hands-on session on the imaging pipeline was very well-received by our students this semester, so next time, next year, we're going to have the same imaging course in the spring.

Yuriy Gusev:

I just want to add that our plan is to expand this imaging informatic course into a full-semester course. So, we are planning to expand and we are definitely planning to expand the topics covered with the workshops on Seven Bridges for imaging informatics, to make it more comprehensive training.

Question 7:

What do you think will make it even easier to set up the course on the platform?

Krithika Bhuvaneshwar:

I had to set up the projects for each of the students, so all the projects are currently under my ownership, but at the end of the semester we would like to transfer the ownerships to the students so they can actually keep a copy of the pipeline and the project under their own credits, in case they want to continue practicing and using it for any the interviews or for their practice.

Yuriy Gusev:

We see that many students are interested to continue using the platform after they are done with learning in our classroom, and we have some examples where students are already using it after they graduated from the program.

Question 8:

Can you go into a little more detail about how the students are using the platform after they take the class?

Yuriy Gusev:

One of the success stories I would like to share is that one of our recent graduates was actively searching for jobs after graduating from our program last year, and he came across a great job opportunity in a very prestigious medical research organization in the US, where he wanted to be a bioinformatician. During his interview, he actually used a live demo using his Seven Bridges account which he used during training. He used the pipelines he learned during our course on precisional health informatic to demonstrate, during his interview, how he can handle genomic data processing on a cloud at Seven Bridges. I know, from his feedback, that there was a very surprised effect [from] the interviewers and they reacted very positively, were very impressed with his skills and ability to show, on the fly, how he can use his platform. And, he successfully landed the job and now is currently working as a clinical bioinformatician at this medical organization.

Also, I want to add, as far as I know, a couple other graduates were using our platform during their Capstone project, which is a summer internship with the industry partners, which we have as a part of our program. We provide students opportunities to have a real-life project with different industry partners. And, two of our students were using metagenomic pipelines on Seven Bridges to implement for COVID-related research, which they were doing during their internship with the company.

Question 9:

Can you provide us some background on the Bioinformatics Core lab at Georgetown? What are the kinds of research you do there?

Yuriy Gusev:

As I mentioned before, we are not only teaching genomics, bioinformatics, and imaging, we also are practitioners. We're actively involved in large-scale research projects, mostly with Lombardi Cancer Center researchers here in Georgetown, where we provide collaborative research services and support (on the bioinformatics side) for various cancer-related, mostly translational, types of projects. I'm co-director and Krithika is a manager of the Bioinformatic Shared Resources for Lombardi Cancer Center. We work closely with multiple investigators on campus. In many cases, we develop multi-year collaborations as a part of our services and we mostly work with genomic sequencing data, RNA sequencing, and other types of sequencing data. [We] also provide a lot of so-called downstream analysis at a general expression level where we do systems-biology type of analysis. We have several focus areas in our research. We provide a lot of support for research of GI-tract cancers, because it's one of the specialties of our cancer center, as well as breast cancer and we also work with brain-cancer related projects.

One of the areas, which is a success story, is the Georgetown Transplant Institute where we work with a group of researchers which help to utilize medical profiling data to understand the mechanisms of transplant rejection in patients. For all of these projects, we are actively using the Seven Bridges platform. We counted, at this point, nearly 50 different projects where we utilize, on a regular basis, Seven Bridges platform for RNA-seq and whole-genome sequencing data analysis. We also do our own research to develop novel methodologies, which you can offer as shared resources to our collaborators at Lombardi We implemented at Seven Bridges a novel viral genomic platform which allows everyone to use human RNA sequencing data to investigate and quantitate the presence of different kinds of viral RNA in tumor samples. This pipeline was developed by our team and was published a couple years ago, in Frontiers In Microbiology. So, this is another example where we utilized Seven Bridges platform specifically, not only to do data processing and analysis, but also to develop novel tools.

Krithika Bhuvaneshwar:

I also want to add, in addition to RNA-seq and whole genome sequencing that Yuriy mentioned, we also have used Seven Bridges platform for running pipelines on whole exome sequencing for our collaborators and we also used it for HLA typing, also for our collaborators., These are built-in pipelines inside the Seven Bridges platform and we have used them for our research projects.

Question 10:

Can you go into a little bit more detail about how the CGC has helped the core lab work on these pipelines and streamline the process for your collaborators?

Yuriy Gusev:

One of the major challenges we faced early on when RNA-seq became very popular is how to handle this enormous amount of data being generated every time when investigators extract RNA from tumor samples and run multiple samples on the sequencing machines. They generate terabytes of data and it was very difficult, initially, for us to find a way to streamline the workflow so that we can actually move the data around, and find an appropriate platform to lend the data and find the tools to bring them to the same cloud. So, after several years of searching, we were happy to find Seven Bridges platform. And, we compared it to others, and we found it very advanced and user-friendly. So, this platform helped us to move the data after they've been generated in our local genomic core facilities, [and] move the data to the secured cloud environment at Seven Bridges. In addition, our collaborators found it very useful that they all are included in Seven Bridges projects as collaborators, so they can actually participate and see what's going on, [and] how the analysis is done. Some of them actually wanted to learn how to use this platform after collaborating with us. We provided training for several faculty on campus on how to use the Seven Bridges platform. So, in a way, Seven Bridges really provided us [an] opportunity to simplify and streamline this handling of massive amounts of genomic data, and make it easier, and also to provide open access, and make it shareable with the collaborators.One of the major challenges we faced early on when RNA-seq became very popular is how to handle this enormous amount of data being generated every time when investigators extract RNA from tumor samples and run multiple samples on the sequencing machines. They generate terabytes of data and it was very difficult, initially, for us to find a way to streamline the workflow so that we can actually move the data around, and find an appropriate platform to lend the data and find the tools to bring them to the same cloud. So, after several years of searching, we were happy to find Seven Bridges platform. And, we compared it to others, and we found it very advanced and user-friendly. So, this platform helped us to move the data after they've been generated in our local genomic core facilities, [and] move the data to the secured cloud environment at Seven Bridges. In addition, our collaborators found it very useful that they all are included in Seven Bridges projects as collaborators, so they can actually participate and see what's going on, [and] how the analysis is done. Some of them actually wanted to learn how to use this platform after collaborating with us. We provided training for several faculty on campus on how to use the Seven Bridges platform. So, in a way, Seven Bridges really provided us [an] opportunity to simplify and streamline this handling of massive amounts of genomic data, and make it easier, and also to provide open access, and make it shareable with the collaborators.

Krithika Bhuvaneshwar:

Prior to when we started using Seven Bridges, this was back in 2015 before we started using the cloud, I remember it took me three months to analyze 75 whole-genome sequencing samples, and this was on the desktop with 2GB RAM. And, after that, there was a project which we used the Google Cloud and the run-time itself was not that long, but because I had to move the data in and out, and then I had to set up the virtual environment, set up all my scripts, and to run all of that took time, and it took me probably about a week to process the samples and those were clinical depth samples. They were really big files. And, then, after that we started using the Seven Bridges platform, and once the pipeline is set up and tested, it's very easy to run them on a web browser. I can start them in the evening and out on a train, through my phone, and next day I'll come back to the office and then it will all be done.

Question 11:

You've shared a success story of how one of your students secured a job using the Seven Bridges platform, but do you have any success stories that you can share on the research front?

Yuriy Gusev:

Well, in research, success is measured by publications… and we have several examples where we published together with our collaborator's work on liver cancer, for example. Also, I already mentioned we published a series of publications, at this point three of them, with Georgetown Transplant Institute, which is also connected to cancer-related research because the transplants, mostly, are done on cancer patients. So, the transplant research, we're really pleased and we think of it as a success story, because we run the RNA-seq analysis on the Seven Bridges platform and this provides us with a basically endless resource of novel insights in the mechanism of transplant rejection, and we continue collaboration with this group and they seem to be very pleased with the outcome.

Krithika Bhuvaneshwar:

I can also add a story we had back in 2017 when Yuriy and I were working on our bioinformatics pipeline, the viGEN Pipeline. At the time that we were working on this pipeline, we came across this blog post on the Seven Bridges newsletter. I think it was by Raunaq, who had compared the Centrifuge Pipeline with the Kraken Pipeline and it was done on Seven Bridges, on some specific, I think it was HPV or some specific cancer type and we had detected a bunch of viruses from his example. So, we actually contacted the Seven Bridges, and they put us in touch with Raunaq, and he helped us to create this viGEN pipeline on the Seven Bridges cloud, and first optimize it to run on the scatter gather mode on the Seven Bridges platform. And, we were able to compare the output and results from our pipeline with Centrifuge and Kraken, and that is actually published in our paper, and we have acknowledged Seven Bridges and we have also cited the blog post by Raunaq. So, I think that probably gave us an additional dimension to our paper, so it was very helpful to be able to compare these pipelines using Seven Bridges.

Yuriy Gusev:

Yes, so I just realized I want to mention one more publication. We just posted on bioRxiv, a preprint of our paper which is currently under consideration in a peer-review journal, where we applied this viral RNA pipeline to COVID-19 data from the lung biopsies of the COVID-19 patients, in addition to also applying to the same RNA-seq data, the metagenomic pipeline through the Seven Bridges platform. We think this is a very interesting new development where we can actually utilize multiple pipelines on the same platform to provide this multi-dimensional research of the specific disease domain, in this case COVID-19 domain which, of course, is a very important topic these days.

Question 12:

What advice would you give to other researchers that are just getting started with the CGC, or working in the cloud in general?

Yuriy Gusev:

That's a good question. We are actually [giving] this kind of advice regularly for our new incoming faculty. Just, maybe, last week, I had connected a new faculty at Lombardi [with] Dave Roberson, to make it easier for him to start using the Seven Bridges platform. I always recommend Seven Bridges to our cancer institutions because of the simplicity, it's an easy-to-learn platform and also [the] availability of very important datasets, such as TCGA and others. In cancer research, it's become customary to utilize the TCGA to match and compare the results generated by researchers to the publicly available data at the TCGA collection. So, it's a important part of modern cancer research, and Seven Bridges provides easy access to the entire collection of the TCGA, as well as TARGET collection, and TCIA collection for imaging, which is also emerging a very important part of modern cancer research, the utilization of imaging in a quantitative way together with other -omics data.

Krithika Bhuvaneshwar:

I think the barrier is very low for new researchers to learn to use the Seven Bridges platform. Even within our center, we have been able to share our RNA-seq pipeline with our colleagues and then they have been able to apply the same pipeline on their own dataset. We just share the pipeline with them and they're able to use the same pipeline on their own data. And, we also have several training and seminars that Yuriy does for researchers in the Lombardi Cancer Center and also Georgetown Medical Center, so I think the barrier is very low because the data, the tools, and everything is already on the cloud. All they need is an account and it's easy to get started. With the free $300 credits, it's very easy to get started.