| Phone Icon 403.210.5275 | Email Icon Contact Us | Resize Text
Post Header Graphic

Open source, open data as game changers in ending a pandemic

The fight against Covid-19 is ongoing, yet recent progress in treatments and vaccines point towards mankind’s eventual victory. It will be years before a complete accounting of every skirmish, battle, and innovation is documented in the history books. However, the unprecedented size of the global collaboration between the world’s scientists is readily evident now. What is not so evident is how they pulled off such intense and critical work in the deadliest of circumstances.


One blog post is not, of course, big enough to hold a conclusive list, much less the finer details, of the many tools adapted and invented for the war against a deadly virus. One day all those stories will be the stuff of legends. But if it is a more scientific accounting you’re looking for now then you’ll likely find it satisfying to start with the paper titled: “Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research” recently published in Oxford Briefings in Bioinformatics.


Meanwhile, the point in this post are the common and unifying factors at the very foundation of the collaborative effort: open source software and open data.


Open Source and why it works in large scientific collaborations


According to opensource.com, the term “open source refers to something people can modify and share because its design is publicly accessible.” Whereas the term “open source software” is the name of software with source code [computer code] that “anyone can inspect, modify, and enhance.”


And while the term was originally intended to apply to computer code in software, it has expanded to include many projects, products, or initiatives that “embrace and celebrate principles of open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community-oriented development.”


In other words, open source as a concept was uniquely suited to a rapid and large-scale collaborative effort the likes of which could only be forged by a lethal pandemic. It should come as no surprise then that open source has ongoing roles in the fight against the current pathogen threat but will also have the same roles in future public health threats.


Example of open source in bioinformatics to track disease transmission


Open source as it pertains to its use in battling infectious diseases is a primary means of sharing resources among collaborators. When multiple people are contributing to the tool’s maintenance and improvements and then share it openly with anyone who needs it, everyone benefits.


Take for example an open-source tool called IDseq that is used in tracking infectious disease outbreaks. This tool enabled researchers in Cambodia to sequence and confirm the country’s first case of COVID-19 in a matter of days instead of weeks. The valuable, real-time insight into transmission of the SARS-CoV-2 coronavirus also helps the entire global community to see where it may be going next. IDseq was developed by the Chan Zuckerberg Initiative which is an organization established and owned by Facebook founder Mark Zuckerberg and his wife Priscilla Chan.




This is just one example of which there are many.


Another example of such a tool was released just this month. It is an open source bioinformatics pipeline for Covid-19 variant detection in wastewater. Zymo Research released VirSieve Bioinformatics Pipeline source code to the environmental microbiology community to rapidly find variants of Covid-19 in sewage. By checking for variants in an unobtrusive, large scale way, researchers can detect its presence much faster than by scanning patients individually.


But that is not to say that accurately detecting variants and other viruses in sewage is an easy task.


“Viral sequencing from wastewater is difficult due to fragmentation and degradation of the viral RNA, often resulting in sequencing errors that ultimately manifest as false mutations,” according to the company’s public statement.


VirSieve, say the researchers, can “identify these false variants and mark them as being low or no confidence, allowing researchers to filter mutations with a higher degree of support.” Zymo Research also produces a DNA/RNA Shield reagent to protect viral RNA from degradation after sample collection and transportation to a lab. Further, the company provides a list of open source and free metagenomics, bioinformatics & epigenetics research tools.


However, open source code is used elsewhere to develop treatments and vaccines too. For more on those tools and open data, check out the Open Source Pharma website. There are a lot more open source resources for these areas, but again its too much info for this one blog post to do justice.


Open data fuels open source software in the fight against Covid-19


Having great tools is one thing but having access to large amounts of useful data is quite another thing entirely.


A key part of the collaboration between scientists all over the world and working in both private and public sectors is sharing valuable data. Healthcare providers, vaccine researchers, pharmaceutical and medical researchers, government entities, and other institutions gathered as much data as they could find as rapidly as possible, given the newness of this particular disease. Once the data was collected, it was stored in repositories and made available to other researchers.


One example of a collection of data repositories made available to qualified researchers is the U.S. National Institutes of Health, Office of Data Science Strategy’s compilation “Open-Access Data and Computational Resources to Address COVID-19.” It’s educational to scroll through and see how many different data sets are gathered there for current and future research.


In Canada, the compilation of freely sharable Covid-19 data sets is called “Covid-19 Open Data.”


Other countries have done similarly. For example, the UK collection of data repositories is known simply as “All data related to coronavirus (covid-19).” In the EU, the official sharable Covid-19 data sets are found on Data. Europa. EU. In Japan, the sharable data is found on the COVID-19 Data Portal JAPAN which is made available to the public in cooperation with the European COVID-19 Data Portal and with the support of many Japanese institutions.


And so it goes worldwide, including for countries under authoritarian rule or with extremely limited resources. One prime example of that is found in the Covid-19 Data Explorer: Global Humanitarian Operations portal of visualizations. Another example is the collection of raw data sets across 56 humanitarian operations working in many different countries.


What it all means


Mankind’s best defense against a lethal virus in a fast-arising pandemic proved to be open collaboration in thinking, bioinformatics tool designs, and data sharing. It’s a model for the ages in that this can serve as the pathway to cope with and eventually conquer even more lethal pathogens that may pop up in the future.


In a way, this is evolution through the blending of human and machine capabilities to discover new disciplines. Much has been learned and developed and now added to the public health arsenal. Here’s to a bright and healthy future for all of mankind!

Open source, open data as game changers in ending a pandemic

Listen Icon Listen to podcast