Efforts to study the early stages of the coronavirus pandemic were supported by a surprising source. A US biologist has “unearthed” partial SARS-CoV-2 genome sequences from the beginnings of the likely epicenter of the pandemic in Wuhan, China, which were placed in a US government database – but later removed.
The partial genome sequences address an evolutionary puzzle surrounding the early genetic diversity of the SARS-CoV-2 coronavirus, although scientists stress that they fail to shed light on its origins. It’s also not entirely clear why researchers at Wuhan University asked for the sequences to be removed from the Sequence Read Archive (SRA), a repository of raw sequencing data owned by the National Center for Biotechnology Information (NCBI), part of the US National Institutes Administered for Health (NIH).
“These sequences are informative, they are not transformative,” says Jesse Bloom, a viral evolutionary geneticist at the Fred Hutchinson Cancer Research Center in Seattle, Washington, who describes in a June 22 preprint how he found the sequences.
Bloom discovered the sequences after looking for genomic data from the early stages of the pandemic. A May 2020 research paper contained a table of publicly available sequence data that contained entries that Bloom had not come across. The sequences have been linked to a paper in which researchers used nanopore sequencing technology to detect the genetic material of SARS-CoV-2 in samples from humans. This study was published in the journal Small in June 2020 after posting it on bioRxiv in March of the same year.
When Bloom searched for the sequences in the SRA using the information listed in the May 2020 paper, the database returned no entries. The SRA stores sequences on cloud storage managed by Google, and Bloom wondered if he could find archived versions of the sequences on cloud servers. This approach worked, and Bloom was able to recover data from 50 samples, 13 of which contained enough raw data to generate partial genomic sequences.
The sequences help solve an evolutionary puzzle about the early stages of the pandemic, says Bloom. The earliest virus sequences from Wuhan are from people linked to the city’s Huanan Seafood Market in December 2019, which was originally believed to be the first time the coronavirus leapt from animals to humans. But the seafood market sequences are more distantly related to the closest relatives of SARS-CoV-2 in bats – the virus’ most likely ultimate origin – than later sequences, including one collected in the United States.
This is surprising, says Bloom, because one would expect that viruses from the early stages of the Wuhan epidemic would be most closely related to the relatives of SARS-CoV-2 that infect bats. The sequences found, which were likely collected in January and February 2020, show this – they are more closely related to the bat virus than the sequences from people associated with the fish market.
This adds to a growing body of evidence, including reports of likely cases from November 2019 that the first human cases of COVID-19 were not linked to the Huanan Seafood Market, say Bloom and other scientists.
“It seemed to me that the Wuhan market was one of the earliest super-spreading events,” says Sudhir Kumar, evolutionary geneticist at Temple University in Philadelphia, Pennsylvania. The sequences unearthed by Bloom suggest that SARS-CoV-2 developed a great diversity in the early stages of the pandemic in China – including Wuhan.
Stephen Goldstein, a virologist at the University of Utah at Salt Lake City, points out that the sequences Bloom found were not hidden: they are detailed, with enough sequence information to show their evolutionary relationship with other early SARS-CoV-2. Knowing sequences in which Small Paper. “I don’t think this preprint tells us much, but it brings to the fore sequence data that was publicly available, albeit under the radar,” says Goldstein.
Bloom says that although the sequences were made public, their removal from the SRA meant few scientists knew about them. A report commissioned by the World Health Organization on the origins of the pandemic did not include the sequences in an evolutionary analysis of early SARS-CoV-2 data. “Nobody noticed they existed,” says Bloom.
The corresponding authors of the Small Paper did not respond to questions from nature‘s news team on why they requested the sequences to be removed from the SRA, which happened before the paper was released. In a statement, the NIH said it removed the data at the request of researchers, who said they wanted to transfer it to another database.
Bloom – who co-authored a letter calling for a re-examination of the origins of the pandemic, including the possibility that the virus escaped or leaked from a laboratory – says his study does and does not shed any light on the origins of the pandemic on the reasons for the sequences have been removed. However, he hopes his efforts will encourage researchers to “think outside the box” and look to other sources such as archival data to get more information from the early days of the pandemic. “There’s probably more data out there,” he says.
This article is reproduced with permission and was first published on June 24, 2021.