top of page

China COVID19 Mutation - On the origin and continuing evolution of SARS-CoV-2 Tuesday, March 10, 202

Updated: Mar 18, 2020

China COVID19 Mutation - On the origin and continuing evolution of SARS-CoV-2

Tuesday, March 10, 2020




Open BLOG above >>>>  CLICK TITLE to Open >>>>>On the origin and continuing evolution of SARS-CoV-2ABSTRACT The SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since impacted a large portion of China and raised major global concern. Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was 17%, suggesting the divergence between the two viruses is much larger than previously estimated. Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by mutations and natural selection besides recombination. Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses evolved into two major types (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date......📷

RESEARCH ARTICLE MICROBIOLOGOn the origin and continuing evolution of SARS-CoV-2Xiaolu Tang1,7, Changcheng Wu1,7, Xiang Li2,3,4,7, Yuhe Song2,5,7, Xinmin Yao1, Xinkai Wu1, Yuange Duan1, Hong Zhang1, Yirong Wang1, Zhaohui Qian6, Jie Cui2,3,*, and Jian Lu1,*1.   State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, 100871, China2.  CAS Key Laboratory of Molecular Virology & Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, China3.  Center for Biosafety Mega-Science, Chinese Academy of Sciences, China4.  University of Chinese Academy of Sciences, China5.  School of Life Sciences, Shanghai University, China6.  NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing7.  These authors contributed equally to this work.*Corresponding authors:Jian Lu, Email: LUJ@pku.edu.cn Jie Cui, Email: jcui@ips.ac.cnABSTRACT The SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since impacted a large portion of China and raised major  global concern. Herein, we investigated  the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was  17%, suggesting the divergence between the two viruses is much larger than previously estimated. Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by mutations and natural selection besides recombination. Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these  viruses  evolved into two major types (designated L and S), that are well defined by two different         SNPs that show nearly complete linkage across the viral strains sequenced to date. Although  the L type (~70%) is more prevalent than the S type (~30%), the S type was found to be the ancestral version. Whereas the L type was more prevalent in the early stages of the outbreak    in Wuhan, the frequency of the L type decreased  after  early  January  2020.  Human  intervention may have placed more severe selective pressure on the L type, which might be more aggressive and spread more quickly. On the other hand, the S type, which is  evolutionarily older and less aggressive, might have increased in relative frequency due to relatively weaker selective pressure. These findings strongly support an urgent need  for further immediate, comprehensive studies that combine genomic data, epidemiological data, and chart records of the clinical symptoms of patients with coronavirus disease 2019 (COVID-19).Keywords: SARS-CoV-2, virus, molecular evolution, population genetics Received:  25-Feb-2020; Revised:  28-Feb-2020; Accepted: 29-Feb-2020.INTRODUCTION The coronavirus disease 2019 (COVID-19) epidemic started in late December 2019 in Wuhan, the capital of Central China's Hubei Province. Since then, it has rapidly spread across China    and in other countries, raising major global concerns. The etiological agent is a novel coronavirus, SARS-CoV-2, named for the similarity of its symptoms to those induced by the severe acute respiratory syndrome. As of February 28, 2020, 78,959 cases of SARS-CoV-2 infection have been confirmed in China, with 2,791 deaths. Worryingly, there have also been more than 3,664 confirmed cases outside of China in 46 countries and areas (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/),raising significant doubts about the likelihood of successful  containment.  Further,  the  genomic sequences of SARS-CoV-2 viruses isolated from a number of patients  share sequence identity higher than 99.9%, suggesting a very recent host shift into humans [1-3].Coronaviruses are naturally hosted and evolutionarily shaped by bats  [4, 5]. Indeed, it has       been postulated that most of the coronaviruses in humans are derived from the bat reservoir [6, 7]. Unsurprisingly, several teams have recently confirmed the genetic similarity between SARS-CoV-2 and a bat betacoronavirus of the sub-genus Sarbecovirus [8-13]. The  whole-genome sequence identity of the novel virus has 96.2% similarity to  a  bat SARS-related coronavirus (SARSr-CoV; RaTG13) collected in Yunnan province, China [2,   14], but is not very similar to the genomes of SARS-CoV (about 79%) or MERS-CoV (about 50%) [1, 15]. It has also been confirmed that the SARS-CoV-2 uses the same receptor, the angiotensin converting enzyme II (ACE2), as the SARS-CoV [11]. Although  the  specific route of transmission from natural reservoirs to humans remains unclear [5, 13],  several studies have shown that pangolins may have provided a partial spike gene to SARS-CoV-2;     the critical functional sites in the spike protein of SAR-CoV-2 are nearly identical to one identified in a virus isolated from a pangolin [16-18].Despite these recent discoveries, several fundamental issues related to the evolutionary  patterns and driving forces behind this outbreak of SARS-CoV-2 remain unexplored [19].  Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses and carried out population genetic analyses of 103 sequenced genomes   of SARS-CoV-2. This work provides new insights into the factors driving the evolution of SARS-CoV-2 and its pattern of spread through the human population.RESULTSMolecular phylogeny and divergence between SARS-CoV-2 and related coronaviruses. For each annotated ORF in the reference genome of  SARS-CoV-2  (NC_045512),  we  extracted                   the      orthologous     sequences     in      human     SARS-CoV,     four     batSARS-related coronaviruses (SARSr-CoV: RaTG13, ZXC21, ZC45, and BM48-31), one Pangolin SARSr-CoV from Guangdong (GD) [17], and six Pangolin SARSr-CoV genomes  from Guangxi (GX) [18] (Table S1). We aligned the coding sequences (CDSs) based on the protein alignments (see Materials and Methods). Most ORFs annotated from SARS-CoV-2 were found to be conserved in other viruses, except for  ORF8 and ORF10 (Table 1). The     protein sequence of SARS-CoV-2 ORF8 shared very low similarity with sequences in SARS-CoV and BM48-31, and ORF10 had a premature stop codon in both SARS-CoV and BM48-31 (Fig. S1). A one-base deletion caused a frame-shift mutation in ORF10 of ZXC21 (Fig. S1).To investigate the phylogenetic relationships between these viruses at the genomic scale, we concatenated coding regions (CDSs) of the nine conserved ORFs (orf1ab, E, M, N, S, ORF3a, ORF6, ORF7a, and ORF7b) and reconstructed the phylogenetic tree using the synonymous  sites (Fig. 1A). We also used CODEML in the PAML [20] to infer the ancestral sequence of each node and calculated the dN (nonsynonymous substitutions per nonsynonymous site), dS (synonymous substitutions per synonymous site), and dN/dS (ω) values for each branch (Fig. 1A). In parallel, we also calculated the pairwise dN, dS, and ω values between SARS-CoV-2 and another virus (Table 1).The genome-wide phylogenetic tree indicated that SARS-CoV-2 was closest to RaTG13, followed by GD Pangolin SARSr-CoV, then by GX Pangolin SARSr-CoVs, then by ZC45      and ZXC21, then by human SARS-CoV, and finally by BM48-31(Fig. 1A). Notably,  we found that the nucleotide divergence at synonymous sites between SARS-CoV-2 and other viruses was much higher than previously anticipated. For example, although the overall genomic nucleotides overall differ ~4% between SARS-CoV-2 and RaTG13, the genomic average dS was 0.17, which means the divergence at the neutral sites is  17% between these   two viruses (Table 1). This is because the nonsynonymous sites are usually under stronger negative selection than synonymous sites, and calculating sequence differences without separating these two classes of sites may underestimate the extent of molecular divergence by several folds.Notably, the dS value varied considerably across genes in SARS-CoV-2 and the other viruses analyzed. In particular, the spike gene (S) consistently exhibited larger dS values than other genes (Table 1). This pattern became clear when we calculated the dS value for each branch    in Fig. 1A for the spike gene versus the concatenated sequences of the remaining genes (Fig. S2). In each branch, the dS of spike was 2.22 ± 1.35 (mean ± SD) times as large as that of the other genes. This extremely elevated dS value of spike could be caused either by a high mutation  rate  or  by  natural  selection  that  favors  synonymous  substitutions. Synonymoussubstitutions may serve  as  another  layer of  genetic regulation, guiding the  efficiency of  mRNA translation by changing codon usage [21]. If positive selection is the driving force for the higher synonymous  substation rate  seen in spike, we expect the frequency of optimal    codons (FOP) of spike to be different from that of other genes. However, our codon usage       bias analysis (Table S2) suggests the FOP of spike was only slightly higher than that of the genomic average (0.717 versus 0.698, see Materials and Methods). Thus, we believe that the elevated synonymous substitution rate measured in spike is more likely caused by higher mutational rates; however, the underlying molecular mechanism remains unclear.Both SARS-CoV and SARS-CoV-2 bind to ACE2 through the RBD of spike protein in order  to initiate membrane fusion and enter  human cells [1, 2, 22-26]. Five out of the six critical   amino acid (AA) residues in RBD were different between SARS-CoV-2 and SARS-CoV (Fig. 1B), and a 3D structural analysis indicated  that  the spike of  SARS-CoV-2  has  a higher  binding affinity to ACE2 than SARS-CoV [23]. Intriguingly, these same six critical AAs are identical between GD Pangolin-CoV and SARS-CoV-2 [16]. In contrast,  although  the  genomes of SARS-CoV-2 and RaTG13 are more similar overall, only one out of the six functional sites are identical between the two viruses (Fig. 1B). It has been proposed that the SARS-CoV-2 RBD region of the spike protein might have resulted from recent recombination events in pangolins [16-18]. Although several ancient recombination events have been  described in spike [27, 28], it also seems likely that the identical functional sites  in  SARS-CoV-2 and GD Pangolin-CoV may actually the result of coincidental convergent evolution [18].If the functional AA residues in the SARS-CoV-2 RBD region were acquired from GD Pangolin-CoV in a very recent recombination event, we  would  expect  the  nucleotide  sequences of this region to be nearly identical between the two viruses. However, for the CDS sequences that span five critical AA sites in the SARS-CoV-2 spike (ranging from codon 484 to 507, covering five adjacent functional sites: F486, Q493, S494, N501, and Y505; Fig. S3), we estimated dS = 0.411, dN = 0.019, and ω= 0.046 between SARS-CoV-2 and GD Pangolin-CoV. By assuming the synonymous substitution rate (u) of 1.67-4.67 x 10-3/site/year, as estimated in SARS-CoV [29], the recombination/introgression, if it occurred at all, would   be estimated  to happen  approximately 19.8-55.4 years  ago. Here, the formulawas used to calculate divergence time; note that the increased mutational rate of spike was considered for this calculation. Thus, it seems very unlikely that SARS-CoV-2 originated from the GD Pangolin-CoV due to a very recent recombination event.  Alternatively, it seems more likely that  a high mutation rate in spike, coupled with strong    natural selection, has shaped the identical functional AA residues between these two viruses,   as proposed previously [18]. Although these sites are maintained in SARS-CoV-2 and GDPDF VERSIONhttps://academic.oup.com/nsr/advance-article-pdf/doi/10.1093/nsr/nwaa036/32757241/nwaa036.pdf





Comments


bottom of page