UBC postdoc identifies over 100,000 new RNA viruses using the power of cloud computing

Map of global RNA sequencing data that Dr. Babaian and his team analyzed to identify new RNA viruses [Source: Serratus Project]

An international team led by a former UBC postdoctoral fellow in medical genetics has uncovered almost ten times more RNA viruses than were previously known—including several new species of coronaviruses.

The team made the discovery by re-analyzing all publicly available RNA sequencing data. The planetary-scale database of RNA viruses they developed could help rapidly identify virus spillover into humans, as well as those viruses that affect livestock, crops, and endangered species.

dr. Artem Babaian is behind the Serratus Project collaboration, which published stunning results in the prestigious scientific journal Nature this week.

Working with the Cloud Innovation Center (CIC), a public/private collaboration between UBC and Amazon Web Services (AWS), the Serratus Project was able to build a “ridiculously powerful” supercomputer equivalent in power to 22,500 CPUs, said Dr. babaian.

dr. Artem Babaian

The supercomputer analyzed 20 million gigabytes of publicly available gene sequence data from 5.7 million biological samples around the world, searching for a specific gene that indicated the presence of an RNA virus. The samples have been collected and freely shared within the world research community over 13 years and include everything from ice-core samples to animal dung.

Researchers with the Serratus Project found 132,000 RNA viruses, where just 15,000 were known previously. The discovery included nine new species of coronaviruses.

dr. Babaian estimates that without the Cloud Innovation Center, it would take a traditional supercomputer well over a year and hundreds of thousands of dollars to perform the 2,000 years of CPU time necessary for this analysis. Serratus accomplished it in 11 days for $24,000.

“We’re entering a new era of understanding the genetic and spatial diversity of viruses in nature, and how a wide variety of animals interface with these viruses. The hope is we’re not caught off guard if something like SARS-CoV-2—the novel coronavirus that causes COVID-19— emerges again. These viruses can be recognized more easily and their natural reservoirs can be found faster. The real goal is these infections are recognized so early that they never become pandemics,” said Dr. Babaian, who holds a PhD in medical genetics from UBC and is now a Banting Fellow at the University of Cambridge.

“If a patient presents with a fever of unknown origin, once that blood is sequenced, you can now connect that unknown virus in the human to a way bigger database of existing viruses. If a patient, for example, presents with a viral infection of unknown origin in St. Louis, you can now search through the database in about two minutes, and connect that virus to, say, a camel in sub-Saharan Africa sampled in 2012 .”

dr. Babaian, 32, had been conducting genetic research into cancer with BC Cancer when the COVID-19 pandemic hit and he switched gears.

The work, which the understated Dr. Babaian says started as a “fun side project,” began March 3, 2020, when he and his climbing partner friend, UBC engineering student Jeff Taylor, sketched out the idea “on the back of a napkin,” said Dr. babaian.

“I should have kept that napkin,” he noted.

dr. Babaian approached UBC’s Cloud Innovation Center for help shortly after, and the Serratus project was born—named after Serratus Mountain in British Columbia, which he and Taylor viewed during a climb in 2020.

dr. Babaian recalled he was sitting on his wife’s nursing chair when the first results started to flash up on his laptop, indicating that Serratus was not only working, but producing data almost incomprehensibly fast.

“The real goal is these infections are recognized so early that they never become pandemics.”
dr. Artem Babaian

“It was probably the most exciting scientific period of my life,” he said. “There are two types of fun. Type 1 is smiling and fun. Type 2 is when you’re miserable while doing it but the memory shines, like rock climbing. In many ways Serratus Type 2 is fun. You just kind of have to believe it’s going to work out.”

dr. Babaian said he would not have been able to do this work without the support of the UBC Cloud Innovation Centre.

“The Cloud Innovation Center was really there unlocking the doors for us,” he said. “We had an idea and they brought in experts from their networks to make it come to life. Now the global community can benefit from all this previously untapped research.”

“Artem approached us with an innovative vision. The power of the Cloud Innovation Center is that we pair our in-house innovation and technology teams from UBC with those from Amazon Web Services,” said Marianne Schroeder, director of the UBC Cloud Innovation Centre. “It was our great privilege to support the realization of this vision; helping to find a technology solution for complex problems is what we do.”

The Centre, which launched right before the pandemic in January 2020, supports challenges that focus on community health and wellbeing. To date, the team has published more than 20 projects including reference architecture and deployment guides all available open source.

“While the public cloud as we know it has been around for 15 years, the last few years of innovation at Amazon Web Services have really made genomics research possible in a new way,” said Coral Kennett, who heads up the Center for Amazon Web services. “We were able to give Artem access to compute power for pennies a query. We highly encourage the research community to submit their projects and ideas to the Cloud Innovation Center so that more innovation comes to light benefitting the community.”

Find out more about other Cloud Innovation Center projects here.

Leave a Comment