Severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) is an RNA virus, responsible for the current pandemic outbreak. In total, 200 genomes of the SARS‐CoV‐2 strains from four host organisms have been analyzed. To investigate the presence of the new mutations in the RNA-directed RNA Polymerase (RdRp) of SARS-CoV-2, we analyzed sequences isolated from different hosts, with particular emphasis on human isolates. We performed a search for the new mutations of the RdRp proteins and study how those newly identified mutations could influence RdRp protein stability. Our results revealed 25 mutations in Rhinolophus sinicus, 1 in Mustela lutreola, 6 in Homo sapiens, and none in Mus musculus RdRp proteins of the SARS-CoV-2 isolates. We found that P323L is the most common stabilising radical mutation in human isolates. Also, we described several unique mutations, specific for studied hosts. Therefore, our data suggest that new and emerging variants of the SARS-CoV-2 RdRp have to be considered for the development of effective therapeutic agents and treatments.
The current pandemic outbreak is caused by the novel coronavirus isolate called severe acute respiratory syndrome coronavirus 2 (SARSCoV‐ 2). This virus is a global threat for mankind, the world economy and the ecology, as well. Research suggests that a high mutation rate and the ability for quick adaptation to new conditions allow SARS‐CoV‐2 to cross interspecies barriers and spread from the natural bats’ reservoirs to other hosts (
1). The SARS-CoV-2 RdRp (also called nonstructural protein 12 - nsp12) is a key player in the multicomponent viral replication/transcription and proofreading complex. Many modern antiviral drugs are designed to specifically inactivate RdRp or to prevent its interaction with other parts of the replication machinery: co-factors nsp7, nsp8, and nsp14 – exonuclease with proofreading function (
2).
In this study, we focused on the identification of mutations in the RdRp domain. In total, we have examined 200 genomes of the SARS-CoV-2 and CoV-like viruses from the “natural” host
Rhinolophus sinicus (
3), secondary hosts
Homo sapiens and
Mustela lutreola (
4), and artificial host – a model organism
Mus musculus (
5). We also studied how those mutations would influence the stability of the RdRp domain of nsp12 protein.
Further research of the SARS-CoV-2 RdRp variants could lead to the development of more effective antiviral drugs and vaccines. Also, our data suggest that new and emerging variants of the SARS-CoV-2 RdRp have to be considered for the development of effective therapeutic agents and treatments.
MATERIAL AND METHODS
Sequences retrieval and analysisIn total 200 complete genome sequences of the SARS-CoV-2 and CoV-like viruses from different hosts have been downloaded for the analysis from NCBI database:
Rhinolophus sinicus – 18,
Mus musculus – 42,
Mustela lutreola – 13,
Homo sapiens – 127 (Supplementary
Table 1-4) (
6). Further, the coronavirus RNA-directed RNA Polymerase (cd21591) (RdRp domain) ORFs protein sequences have been retrieved with the NCBI ORFinder (https://www.ncbi.nlm.nih.gov/orffinder/). Conserved domains have been checked with CD-search (NCBI), respectively. Complete translated ORFs were used for the multiple sequence alignments performed with MUSCLE (
7), implemented in Ugene 34 software (
8), and checked for mutations.
Secondary structures (helix, sheets, and coil) were predicated with PSIPRED server (PSI-blast based secondary structure PREDiction) (http://bioinf.cs.ucl.ac.uk/psipred/) (
9).
Effect of mutations on the protein stabilityThe effect of identified mutations on the protein stability, flexibility and motion was studied with MAESTRO on-line tool (
10), Dynamut server (
11), and DUET (
12). MAESTRO predictions are based on artificial neural networks (ANN), support vector machines (SVM) and multiple linear regression (MLR), with ΔΔG values as an output. In addition to the Normal Mode Analysis (NMA) of the structures, Dynamut implements an algorithm to analyze the effect of point mutation(s), with a wide set of parameters, describing the influence of the vibrational entropy changes on the protein dynamics and stability. DUET server uses the advantage of two methods (SDM and mCSM) combined by Support Vector Machines (SVMs).
RESULTS
Identification of mutationsThe SARS-CoV-like genome sequences from the natural host
Rhinolophus sinicus have been analyzed. In total, we have identified 25 mutations in the RdRp domain (
Table 1). Mutations have been found in 16 out of 18 analyzed sequences. 6 RdRp domains have had a single mutation, 5 – double mutations, 2 – triple, and 3 – multiple mutations. In general, mutations were located throughout the entire domain. There were only two rather unique mutations, T118 and D125. Those two mutations have appeared in several genomes and were shown to mutate in several amino acids: T118 to N and A; D125 to G, E and N.
Forty-two genomes of the mice (
Mus musculus) - isolated SARS-CoV-like sequences have been analyzed. Surprisingly, we found no mutations in the RdRp domain (data not shown).
Thirteen genomes of the mink (
Mustela lutreola) - isolated SARS-CoV-2 sequences have been analyzed. In 7 RdRp domains, we found only one mutation – P323L (
Table 2).
In total 127 SARS-CoV-2 genomes from the human host were analyzed. In this analysis, the Wuhan RdRp domain sequence was counted as an original (wild type). Identified mutations are listed in
Table 3 and shown in Supplementary Fig. 1 and 2 (
6). Mutations in 6 positions were identified: G179S, E278D, P323L, L329I, A449V, A660S. Interestingly, the P323L single mutation was the most common and detected in more than half of the analyzed countries. The G179S single mutation was identified only in one of two analyzed Malaysia isolates (MT372481), and A660S – in the Japanese. P323L and A449V double mutation were found only in Greece isolate. Two double mutations (E278D and P323L, P323L and L329I) were found only in isolates from India (
Table 3).
According to the secondary structures, G179S, E278D, A449V and A660S were located to different helices, when P323L and L329I were located on one coil (Supplementary Fig. 1 and 2) (
6).
Effect of mutations on the free energy and protein stabilityTo determine how identified mutations could influence RNA-directed RNA Polymerase tertiary dynamics and stability, we used 3 different tools, based on discrete algorithms. Mutations have been examined individually and in combinations (Supplementary Table 6 and 7) (
6). In total, the amount of the conservative and radical mutations was almost equal, although radical mutations have provided a higher effect on the free energy change. Single mutations with the highest change in free energy are highlighted in bold in Supplementary Table 6 (
6). The most common mutation in human and mink isolates (P323L) (consistent with at least two used tools) is the only stabilising type of mutation. The second human mutation with the highest change in free energy is A660S, the radical mutation that was predicted to have a destabilizing effect. Single mutations of the
Rhinolophus sinicus are almost equal by nature (14 radicals / 10 conservatives).
Three of them are destabilizing (A128S, G711S and L278F) and two are stabilizing mutations (H772Q and conservative R138K). The L278F mutation is the only mutation confirmed by several tools. The dual, triple, and multiple mutations with the highest change in free energy, identified in the human and
Rhinolophus sinicus RdRp domain, are stabilizing types (Supplementary Table 7) (
6).
The only significant positive change of ΔΔS VibENCoM was detected for the R754C radical mutation of the
Rhinolophus sinicus, suggesting a gain in flexibility.
DISCUSSION
The high mutation rate is one of the main adaptation mechanisms, exploited by RNA viruses (
13). Also, RNA viruses can regulate their replication fidelity (
14). The combination of those two unique features provides coronaviruses with the ability for quick spreading and adaptation to the new hosts, in order to overcome natural and vaccine-induced immune response and to develop resistance to antiviral drugs (
15). Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) and Middle East Respiratory Syndrome coronavirus are two strains of animal coronaviruses that adapt to a human host and cause several local epidemics within the last 20 years. The current pandemic outbreak of the novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) is a global threat, affecting people worldwide. RNA-directed RNA Polymerase is one of the key elements of the coronavirus replication machinery and a target for several modern antiviral drugs (
2). In this paper, we reported mutations in the RdRp domain in a “natural” host for the coronavirus bat
Rhinolophus sinicus, model organism (
Mus musculus) and “secondary” hosts -
Mustela lutreola and
Homo sapiens (human isolates data were collected from several countries). Our results have found stabilizing P323L mutation as the most common in SARS‐CoV‐2 human isolates around the world and as the only mutation defined in
Mustela.
Although many strategies applying a computational approach to predict the effect of a mutation on protein dynamics and thermostability have been proposed, this problem is complex and still requires further research. To obtain maximally accurate data, we have used several tools based on different algorithms, thus, sometimes resulting in contradicting interpretation. MAESTRO applies artificial neural networks (ANN), support vector machines (SVM), and multiple linear regression (MLR), based on the distance-dependent residue pair and solvent exposure of protein residues statistical scoring functions (
10). DUET combines two methods: Site-Directed Mutator (SDM) – a statistical potential energy function and mCSM signatures – the graph-based concept of Cutoff Scanning Matrix (CSM) (
12). ENCoM method employs an Elastic Network Contact Model that is based on the coarse-grained normal mode analysis (NMA) (
16). Dynamut implements a machinelearning algorithm to analyze and non-redundant blind test set to validate the effect of point mutation(s) (
11). Mutations, confirmed by at least 2 tools have been counted as consistent.
In our study, we found that bat isolates have 25 mutations, the majority of which are unique. That could represent the pool of potentially useful mutations that would help the virus to adapt to the new environment, host, or fight with the immune system or drugs. Minks, on the contrary, have been described as a target species for the SARS-CoV-2 only recently and, most probably, acquire the virus from the farmworkers (in some cases with P323L mutation) (
4). In our study, mice represented an unusual host for the SARS-CoV and SARS-CoV-2, because it was shown that viral replication could be reached only in inbred, knockout, or transgenic lines (
5,
17), whereas the wild-type line is resistant (
18). Based on the worldwide presence of the P323L mutation from the human isolates, it is tempting to speculate that this particular mutation has evolved as a result of adaptation to the human host, improving the interaction of the RdRp (nsp12) with nsp7, nsp8 and nsp14, achieving proper replication and proofreading (
19). The second consistent mutation with a high change in ΔΔG (A660G) was confirmed to be destabilizing but has been identified in a single isolate from Japan.
It is known that the RdRp protein is a target for many antiviral drugs, that could bind to the RdRp protein to prevent normal functioning (20). It was shown that point mutation in the RdRp could lead to drug-resistance (
21). Numerous point mutations in the RdRp protein have been described (G64, V173, F483, V560, M618, D868, L420, double K159/A239) causing resistance to the effective antiviral drugs (primarily, nucleoside analogs: ribavirin, 5-fluorouracil, remdesivir) (
22,
23). Thus, our newly identified mutations could evolve as acquired resistance to antiviral drugs or host-specific antibody-escape mechanism. Further research is required to define how those mutations alter the replication/proofreading process and efficiency of the RdRp-targeted antiviral drugs. The described point mutations in the RNA-directed RNA Polymerase are associated with drug-SARSCoV- 2 isolate efficiency in a given country. That means that antiviral drug has to be checked on several isolates, specific for a particular region/ opulation.
Recently, the structure of the RdRp (nsp12) protein has been identified (
24). Based on its structure, the position of the P323L mutation was located to the interface domain (residues A250 to R365). The interface domain is known to connect a nidovirus-specific N-terminal extension (NiRAN) domain (residues D60 to R249) and a right-hand RdRp domain (residues S367 to F920). It was predicted, that two effective nucleotide analog antiviral drugs, remdesivir and sofosbuvir, are binding to the nsp12, disrupting the interaction between the right-hand RdRp and NiRAN domains thus inhibiting elongation (
25). Further research is required to understand the effect of the P323L mutation of the interface domain on the efficiency of the RdRp RNA synthesis and the performance of these drugs.
Several recent studies have investigated mutations in the RNA-directed RNA Polymerase, with rather contradicting results. In the recent work, Pachetti et al. (
26) have described P323L mutation (signed as “14408” mutation in the manuscript) as predominant for the European population. Another paper (
27) also defines P323L mutation as a cross-continent mutation, mostly specific for Europe, with only a minor presentation in the Asia region. On the contrary, the same mutation was identified as stabilizing in the Indian isolates (
28). While only isolates from India have been analyzed by Chand et al with the DynaMut software. Altogether, these papers (
26,
27,
28) have supported our conclusion that P323L is a worldwide, a human-host specific mutation in the RdRp domain of the nsp12 protein.
Our data suggest that the maximal change in vibrational entropy energy (ΔΔS VibENCoM) between wild type and mutant variants had a negative value, implying rigidification of the protein (Supplementary Table 6) (
6).
Our data suggest that
Rhinolophus sinicus as a natural host for the SARS-CoV have a wide range of mutations (both conservative and radical) that mostly do not influence protein dynamics and stability. Multiple mutations, on the contrary, have reduced free energy and provide a stabilizing effect on the protein.
Mus musculus, an artificial animal model to study SARS-CoV, has no mutations in the RdRp domain. Secondary hosts (
Mustela lutreola and
Homo sapiens) have one common and frequent mutation P323L that was predicted to have a stabilizing effect on the protein. In addition to several conservative mutations with minor effects on the free energy, the human RdRp domain contains also 2 radical mutations (G179S and A660S) that were predicted to cause protein destabilization (Supplementary Table 6) (
6).
CONCLUSION
We identified 25 mutations in the RdRp domain from the bat-isolated SARS-CoV isolates. Those mutations represent a pool of neutral mutations with mostly minor effects on the protein-free energy. Among screened human isolates we found 6 mutations, one worldwide-present mutation (P323L) was predicted to have a stabilizing effect on the protein tertiary structure. Further research is necessary to understand the effect of the described mutations on the RdRp interaction with other proteins and the emergence of the antiviral drugresistance isolates.
CONFLICT OF INTEREST
The authors declared that they have no potential conflict of interest with respect to the authorship and/or publication of this article.
ACKNOWLEDGEMENTS
This research was supported by the Vibstec State Academy of Veterinary Medicine.
AUTHORS’ CONTRIBUTION
SAD conceived and designed this research and performed the experiments. SAD and YKK carried out data analysis. SAD wrote the manuscript. YKK supervised the project. SAD and YKK reviewed and edited the manuscript.