Introduction
The coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has had a profound global impact. It has resulted in a significant loss of human life and has posed unprecedented threats to public health systems, food security, economies, and education [
1]. On May 5, 2023, the World Health Organization (WHO) declared that COVID-19 no longer constituted a Public Health Emergency of International Concern. Instead, it was recognized as an established and ongoing health challenge [
2]. As of May 18, 2023, there have been over 766 million confirmed cases of COVID-19, leading to 6.9 million reported deaths worldwide. In Bhutan, there have been 62,668 confirmed cases of COVID-19, resulting in 21 deaths [
3].
The SARS-CoV-2 virus is subject to frequent mutations, which result in the emergence of various new variants. The complete genomic sequence of SARS-CoV-2 was first published on January 3, 2020 [
4]. Since that time, over 151 thousand genomic sequences have been submitted to the Global Initiative on Sharing All Influenza Database (GISAID) [
5]. The SARS-CoV-2 genome is composed of a positive-sense, single-stranded RNA of approximately 30,000 nucleotides, with a guanine-cytosine (GC) content of 38% [
6]. It encodes and expresses a total of 12 proteins. The genome includes 13 to 15 open reading frames (ORFs), of which 12 are functional. Two-thirds of the genome is made up of 2 ORFs, ORF1a and ORF1b, which encode 2 polyproteins, pp1a and pp1ab, through a ribosomal frameshifting mechanism. These polyproteins are subsequently processed by viral proteases into 16 non-structural proteins (NSPs), which facilitate the replication and transcription of the viral RNA. The virus forms a membrane-associated replication and transcription complex that shields the viral RNA from the host’s innate immune sensors, thereby enhancing viral replication efficiency [
7,
8]. Notably, the ORF1ab and 4 structural proteins (spike, envelope, membrane, and nucleocapsid) are arranged in a 5′-3′ order and are potential targets for drugs and vaccines. In addition, the genome encodes accessory proteins and other NSPs via the ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 genes [
8].
Nucleotide variations in genes can alter the structure and function of viral proteins. These mutations can influence viral characteristics, potentially triggering new outbreaks, reducing vaccine effectiveness, and impeding the development of new drugs and diagnostic kits [
9]. Initially, genetic lineages were identified based on the phylogenetic framework of GISAID [
5], Nextstrain [
10], or Pangolin [
11]. In response to the emergence of new variants and to avoid stigmatizing the countries where these key variants originated, the WHO established a working group of experts. This group is responsible for monitoring the virus's evolution and creating a nomenclature to differentiate these variants. The working group assigned these variants with letters from the Greek alphabet and classified them as either variants of interest (VOIs) or variants of concern (VOCs), depending on their transmissibility, severity, and ability to evade the immune system [
12].
While the overall number of COVID-19 cases is on the decline, thanks to increasing population immunity and decreasing mortality rates, the ongoing evolution of SARS-CoV-2 and the emergence of VOIs and VOCs remain major global concerns [
13]. These variants, which include Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), Delta (B.1.617.2), Omicron BA.2 (B.1.1.529.2), BQ.1 (B.1.1.529.3), and XBB (B.1.1.529.5), have demonstrated increased transmissibility, immune evasion, or diminished vaccine efficacy in comparison to the original strain [
14]. It is essential to monitor the circulation and impact of these variants in order to adjust public health measures as necessary.
The primary objective of this study was to identify the variants and amino acid mutations of SARS-CoV-2 strains that circulated in Bhutan during the pandemic. We conducted a comparative analysis of the nucleotide and amino acid sequences of Bhutanese SARS-CoV-2 strains, using the Wuhan-Hu-1 strain as a reference. In addition, we explored the phylogenetic relationship among these strains to gain insight into their evolutionary links. Our focus was on pinpointing specific mutations and alterations in the amino acid sequence of functional proteins within the SARS-CoV-2 strains from Bhutan, including the spike, envelope, membrane, and nucleocapsid proteins. We analyzed samples collected from Bhutan over an extended period, from May 2020 to February 2023 (i.e., nearly 3 years), to track the evolutionary changes and trends of SARS-CoV-2 within the Bhutanese population. This extensive temporal distribution enabled us to observe changes over time in the virus's genetic composition and its potential influence on disease dynamics. This study represents the first report on the genomic characterization of SARS-CoV-2 in Bhutan.
Materials and Methods
Sequence Data Acquisition
Full-genome sequences of SARS-CoV-2 from Bhutan were retrieved from the GISAID database (
https://gisaid.org) for samples collected between May 2020 and February 2023. We only included high-quality whole-genome sequences that exceeded 29,000 base pairs (bp). All sequences were obtained in a multi-sequence FASTA format, and each was accompanied by metadata detailing patient status and sequencing technology. Our dataset consisted of 135 whole-genome sequences, distributed across the years as follows: 40 from 2020, 38 from 2021, 46 from 2022, and 11 from 2023. We downloaded the reference sequence NC_045512.2 as a separate FASTA file.
Supplementary Material 1 includes the EPI_SET ID and the respective digital object identifier (DOI) for each SARS-CoV-2 sequence used in this study.
Phylogenetic Analysis
Nextstrain (
https://nextstrain.org), an open-source bioinformatics visualization platform designed for the phylogenetic analysis of complete SARS-CoV-2 genomic sequences, was used. Nextclade ver. 2.13.0 [
15], a web-based tool integrated within Nextstrain, was employed for sequence quality assessment and clade assignment. GISAID clades and WHO labels were determined using the GISAID CoVsurver tool (
https://gisaid.org/) [
16] and the coronavirus typing tool [
17], respectively. The Pangolin webserver data ver. 1.19 (
https://pangolin.cog-uk.io/) was used to further classify lineage typology. The geographical distribution of clades was visualized using Microreact (
https://microreact.org).
Sequence Alignment and Recombination Analysis
To align the sequences, we obtained pairwise sequence alignment data in FASTA format from Nextclade and conducted multiple sequence alignment (MSA) using MAFFT ver. 7 (
https://mafft.cbrc.jp/alignment/software/) [
18]. This ensured optimal sequence congruence for the following procedures. To explore the presence of recombination events and pinpoint breakpoints within the aligned sequences, we utilized the Recombinant Detection Program ver. 4.5 (RDP4,
http://web.cbio.uct.ac.za/~darren/rdp.html) [
19]. This program incorporates various algorithms, including RDP, GENECONV, Maxchi, Chimaera, SiSscan, PhylPro, and 3Seq. We constructed a maximum likelihood phylogenetic tree directly using RDP4, in conjunction with the RAxML8 [
20] algorithm, which takes recombination events into consideration. The final tree annotation, which depicts the evolutionary relationships among the SARS-CoV-2 sequences, was produced using iTOL ver. 6.7.4 (
https://itol.embl.de/) [
21].
Estimation of Time to the Most Recent Common Ancestor
To calculate the time to the most recent common ancestor (TMRCA) and the 95% highest posterior density (HPD) of SARS-CoV-2 genomes from Bhutan, we performed a Bayesian Markov chain Monte Carlo (MCMC) [
22] analysis using BEAST ver. 1.10.4 (
https://beast.community/) [
23]. The MSA created by MAFFT was imported into BEAST. We used a Hasegawa-Kishino-Yano substitution model [
24] for nucleotide frequencies and kappa, along with a strict molecular clock with uniform rates across branches (rate set to 1.0). We also used a coalescent exponential growth model with a Laplace prior (scale 100), and a coalescent population size prior distribution uniform between 0 and 10. The analysis started with a random starting tree. We ran a total of 100 million MCMC steps, sampling and logging parameters every 10,000 generations. We verified the effective sample sizes using Tracer ver. 1.7.2 (
https://beast.community/tracer) [
25] to ensure a sample size greater than 2,000. We extracted the TMRCA and the 95% HPD from the BEAST output log files using Tracer ver. 1.7.2. The maximum clade credibility (MCC) tree was generated using TreeAnnotator ver. 1.4 (
https://beast.community/treeannotator), discarding the first 10% as burn-in. Finally, we visualized the final tree using FigTree v1.4 (
http://tree.bio.ed.ac.uk/software/figtree/).
Mutation Analysis
Both nucleotide and amino acid mutations were identified with the SARS-CoV-2 isolate Wuhan-Hu-1 (NC_045512.2) as a reference, using the GISAID CoVsurver tool and the coronavirus typing tool. We analyzed the frequency of mutations for all 135 sequences, and the frequency of mutations for each protein was calculated based on the total number of samples with point mutations.
IRB Approval
Ethical clearance and approval were not applicable since the data used in this study was obtained from a publicly available database, GISAID. No additional demographic or clinical information was used in this study.
Results
SARS-CoV-2 Genome Analysis
We conducted a comprehensive analysis of the genetic diversity in 135 SARS-CoV-2 genomes, which were gathered from Bhutan between May 2020 and February 2023. These genomes were classified using various nomenclature systems, such as GISAID, Nextstrain, WHO, and Pangolin, through phylogenetic analyses. Furthermore, we offered in-depth insights into the demographic characteristics of the sample set, the quality of the genomes, and their temporal distribution.
The distribution of the SARS-CoV-2 genome across Bhutan was as follows: both Paro and Thimphu had 38 genomes each, while Phuentsholing had 35 genomes. Other locations represented in the dataset included Dagana with 4 genomes, and Punakha, Tsirang, and Mongar, each with 3 genomes. Furthermore, both Samdrupjongkhar and Gelephu had 2 genomes each, while Trongsa and Haa each had a single genome. Our sample set comprised 80 males and 55 females, with ages ranging from 10 to 85 years and a median age of 36 years. The average length of the viral genome was 29,800 bp, with an average genome coverage of 98.8%. The average length of amino acids was found to be 9,691. According to Nextclade’s overall quality scoring system, of the 135 sequences analyzed, 131 were classified as good quality, while 4 were deemed mediocre quality. None of the sequences were flagged as bad quality.
Phylogenetic Relationships among Bhutanese SARS-CoV-2 Viral Strains
After conducting a phylogenetic analysis, we identified 6 distinct GISAID clades (G, GH, GK, GR, GRA, and GV) to which the Bhutanese SARS-CoV-2 strains were assigned. The GH clade was predominant in 2020, succeeded by the G clade in 2021, and then the GRA clade in 2022 and early 2023. We also identified 4 distinct WHO labels within the Bhutanese SARS-CoV-2 strains: Alpha, Beta, Delta, and Omicron. Notably, approximately 35.6% of the sequences did not match any WHO label and were instead assigned to international A_B diversity by the coronavirus typing tool. From May 2020 to February 2023, we detected a total of 29 Pangolin lineages among the strains. The B.1 variant was the most prevalent in 2020, accounting for 37.7% of the cases. In 2021, the dominant variant was B.1.617, representing 65% of the cases. By 2022, the BA.2.10 lineage became prominent, later transitioning to EP.1, which accounted for 81.8% of the cases in early 2023. The distribution of Pangolin lineages within the samples showed temporal variations, reflecting the dynamic nature of the viral strains circulating in Bhutan (
Figure 1).
Table 1 provides a detailed frequency distribution of each nomenclature system. We also identified 2 recombinant variants, classified as Pangolin lineages BA.2 and XBF, using Nextclade and the coronavirus typing tool. To further explore recombination events, we employed RDP ver. 4.5 software, which confirmed the occurrence of recombination in at least 3 Bhutanese SARS-CoV-2 strains. These findings strongly suggest genetic recombination, highlighting the significant genetic diversity of SARS-CoV-2 in Bhutan. Detailed information on the recombination events in Bhutanese SARS-CoV-2 genomes can be found in
Supplementary Material 2.
To infer the origin and evaluate potential divergence events of SARS-CoV-2, we created a maximum likelihood phylogenetic tree. This was based on 135 complete genomes from isolates in Bhutan and the reference virus from Wuhan, China (NC_045512.2), using RDP4 (with RAxML8). The tree demonstrated a swift diversification of SARS-CoV-2, spread across 14 distinct Nextstrain clades. These included 20A, 20B, 20C, 20E, 20H, 20I, 21A, 21I, 21J, 21L, 22B, 22D, and 22F (
Figures 2,
3).
Time to the Most Recent Common Ancestor of SARS-CoV-2 in Bhutan
The TMRCA for the SARS-CoV-2 strains circulating in Bhutan was estimated to be around February 15, 2020, with a 95% HPD interval ranging from January 3, 2020 to September 13, 2020. The mean substitution rate was calculated to be 0.97×10
–3 substitutions per site per year, with a 95% HPD of 0.85×10
–3 to 1.10×10
–3.
Figure 4 illustrates the MCC phylogenetic relationships among the SARS-CoV-2 genomes, as determined from the Bayesian coalescent framework using the coalescent exponential growth and tip-dating method.
Mutation Analysis
Single-nucleotide variations
The mutational revealed 9,413 single-nucleotide variations (SNVs) within the SARS-CoV-2 genomes. These SNVs comprised 6,568 substitutions and 2,845 deletions. As a result of these SNVs, 5,613 amino acid changes were observed, which included 4,762 amino acid substitutions and 851 deletions.
Figure 5 illustrates the locations and frequencies of the nucleotide variations plotted along the genomic sequence of the reference SARS-CoV-2 (NC_045512.2). Among the SNVs, the most common alteration was C>T, which accounted for 40% of the observed SNVs, followed by G>A, making up 12%. Importantly, no large deletions or insertions exceeding 50 nucleotides were detected. The longest deletion identified extended from position 29734 to 29759, resulting in the removal of a 26-nucleotide sequence (GAGGCCACGCGGAGTACGATCGAGTG).
Frequency of Mutations among Major Viral Proteins
The spike glycoprotein (S) displayed the highest mutation frequency among the major viral proteins, with 116 unique types identified. The most common amino acid substitutions in S were D614G (135/135), T478K (83/135), G142D (81/135), and N501Y (61/135). In contrast, NSP7 and NSP11 were highly conserved, with no mutations detected. NSP10 (2/135), NSP9 (1/135), NS6 (1/135), and NS7a (1/135) also exhibited low mutation rates. The specific frequency of amino acid mutations in the major proteins of Bhutanese SARS-CoV-2 viral strains is illustrated in
Figure 6 and
Table 2.
Discussion
In this study, we conducted a comprehensive analysis of 135 whole-genome sequences of SARS-CoV-2, collected from Bhutan between May 2020 and February 2023. These sequences, obtained from the GISAID database, were compared with the Wuhan-Hu-1 (NC_045512) reference strain. We characterized the genomic variation, as well as the phylogenetic and evolutionary dynamics, of SARS-CoV-2 from Bhutan. Our phylogenetic analysis revealed that multiple SARS-CoV-2 lineages were circulating in Bhutan in 2020, 2021, 2022, and early 2023. We discovered that the virus in Bhutan was distributed across 14 Nextstrain clades and 26 Pangolin lineages. In 2020, the predominant variant was B.1, accounting for 37.7% of the sequenced samples. This variant, which exhibited a global distribution in 2020, gave rise to more than 70 sub-lineages, indicating its extensive diversification [
11]. In 2021, the dominant lineage in Bhutan shifted to B.1.617, prevalent in 65% of the sequenced samples. This variant, B.1.617, carried 3 key mutations (L452R, E484Q, and P681R) in the viral spike protein and was first detected in India in December 2020 [
26]. Lineage B.1.617.2, a sub-lineage of B.1.617 also known as the Delta variant, gradually replaced the B.1.1.7 variant as the dominant global strain by March 2021 [
27].
In 2022, the most prevalent lineage of SARS-CoV-2 in Bhutan was BA.2.10 (Omicron), accounting for 48% of the sequenced samples. This variant first appeared in South Africa in November 2021 and led to a worldwide surge in COVID-19 cases due to its increased transmissibility compared to BA.1 [
28]. Bhutan also experienced a significant increase in COVID-19 cases that year, resulting in a series of lockdowns and widespread testing [
29]. Interestingly, a similar trend was observed in North India, where the Omicron BA.2 lineage became prominent during the third wave of COVID-19 cases, as documented by Zaman et al. [
30]. This concurrent emergence of the Omicron variant in both Bhutan and North India prompts inquiries and hypotheses about potential links between these events and the dynamics of the cross-border transmission of the variant.
The mutations observed in Bhutanese strains offer significant insights into the country's viral dynamics and their potential impact on global trends. Like many other regions, Bhutan has seen dynamic changes in the dominant SARS-CoV-2 lineages throughout the study period. This mirrors global trends and highlights the role of international travel and viral transmission networks. It further emphasizes the interconnectedness of Bhutan’s SARS-CoV-2 dynamics with the wider global landscape.
SARS-CoV-2 exhibits a high rate of genomic mutations, and our study identified both SNVs and the distribution of point mutations across the major viral proteins. The most frequently detected SNVs in SARS-CoV-2 genomes from Bhutan were C>T, followed by G>A. This aligns with previous studies, which have reported similar findings and suggested a potential explanation. This explanation posits that there is a selective mutation pressure to decrease the number of CpG sites in the virus’s genome. This is due to the abundance of human antiviral proteins, such as APOBEC3 and ZAP [
31,
32]. Among the proteins we analyzed, the spike glycoprotein exhibited the highest number of point mutations. This was followed by N, NSP3, NSP12, and NSP4, in that order. These proteins play crucial roles in the viral life cycle and are promising targets for vaccines and therapeutic interventions [
33]. The high frequency of point mutations in these proteins raises questions about the effectiveness of drug candidates being investigated as potential therapies or diagnostics that target these proteins. High mutation rates can undermine the efficacy of targeted treatments, potentially weakening their ability to disrupt viral replication or inhibit viral protein function.
One of the key findings of our study is the abundance of mutations in the spike glycoprotein, a vital component in viral infection and pathogenesis. We observed several noteworthy mutations in the spike protein of Bhutanese strains, including D614G, T478K, G142D, N501Y, K417N, P681H, A27S, D405N, D796Y, E484A, H655Y, L24del, N440K, N679K, N969K, P25del, P26del, Q498R, Q954H, R408S, S371F, S373P, S375F, S477N, T19I, T376A, Y505H, N764K, and V213G. These mutations were identified in over 50 samples. Among the functional proteins of the SARS-CoV-2 virus, the spike protein shows the highest mutation rate. Many studies have underscored the significance of mutated residues on the S-receptor binding domain (S-RBD) of the Omicron variant, as they enhance its binding affinity with ACE2 [
34].
The mutations observed in Bhutanese strains are of significant importance due to their potential impact on various aspects of the virus's behavior. For instance, the D614G mutation, which is commonly observed, has been reported in 99.03% of samples from 212 countries worldwide [
5]. Strains carrying this mutation have been found to display lower reverse transcription-polymerase chain reaction cycle thresholds, higher upper respiratory tract viral loads without an increase in disease severity, and enhanced infectivity [
14]. Another mutation, N501Y, found in Bhutanese strains, is known to modify the shape of the RBD, thereby increasing its binding affinity with human ACE2 receptors. This mutation is present in VOCs, such as those first identified in the UK (B.1.1.7), South Africa (B.1.351), and Brazil (P.1) [
35,
36]. These variants have been demonstrated to spread more rapidly than other lineages and may also affect vaccine efficacy to some extent. However, there is no evidence to suggest that they cause more severe disease or increase the risk of death. Other mutations, such as E484K, K417N, and S477N in the spike protein, have also been associated with changes in viral properties [
37] and were frequently found among strains from Bhutan. These findings highlight the potential implications of these mutations for viral pathogenesis, transmission, and vaccine effectiveness. Further research is required to gain a better understanding of their specific roles in disease dynamics.
In addition to the mutations in the spike protein, our analysis also investigated mutations in other key viral proteins, providing insights into potential therapeutic targets and diagnostic challenges. The discovery of mutations in N, NSP3, NSP12, and NSP4 is especially significant, as these proteins play crucial roles in the viral life cycle and are promising targets for vaccine development and therapeutic interventions. However, the high mutation rates observed in these proteins raise questions about the efficacy of drug candidates being considered for potential therapies or diagnostics. The ever-changing nature of the virus highlights the need for ongoing research and development of antiviral therapies that can effectively combat these mutations.
This study also analyzed non-mutated proteins, such as NSP7 and NSP11, emphasizing their conservation. Proteins that are highly conserved, including NSP10, NSP9, NS6, and NS7a, have crucial roles in the viral life cycle and interactions with the host. Previous studies have identified NSP10, NSP7, NSP8, NSP11, NSP16, ORF6, and NSP9 as highly conserved proteins in SARS-CoV-2 [
38,
39]. Mutations in these proteins could potentially impair their functions and decrease viral fitness. For example, NSP7 and NSP11 are co-factors of the viral polymerase NSP12 and are vital for its activity [
38]. NSP10 enhances the processivity and fidelity of NSP12, acting as its co-factor [
40]. NSP9, an RNA-binding protein, may aid in viral RNA synthesis. NS6 and NSP7a are accessory proteins that adjust the host immune response and disrupt interferon signaling [
7]. The preservation of these proteins underscores their significance in sustaining viral functions. Approaches that target conserved regions or utilize combination therapies may offer potential solutions to the problems created by the virus’s mutational landscape, thereby enhancing the effectiveness of therapeutic interventions.
The TMRCA of SARS-CoV-2 strains circulating in Bhutan was estimated in this study to have occurred around February 15, 2020, with a 95% HPD interval ranging from January 3, 2020 to September 13, 2020. This estimated TMRCA aligns with the timing of Bhutan’s first confirmed COVID-19 case, which was reported on March 5, 2020 [
41]. Additionally, we calculated a substitution rate of 0.97×10
–3 substitutions per site per year, consistent with findings from other studies [
42–
45].
In response to the initial detection of its first COVID-19 case, Bhutan implemented swift and comprehensive measures, including the closure of all international borders, to prevent the local transmission of SARS-CoV-2. Of particular note is the remarkable success of Bhutan’s vaccination efforts, which underscores the effectiveness of the country’s vaccination strategy and the readiness of its citizens to accept the vaccine. Bhutan procured and administered vaccines through a mix of bilateral and multilateral donations from countries such as India, the United States, Denmark, China, Croatia, among others. By April 9, 2021, a total of 472,139 individuals (consisting of 250,362 men and 221,777 women) had been successfully vaccinated with the first dose of COVISHIELD (Oxford-AstraZeneca, produced by the Serum Institute of India), achieving an impressive coverage of 94% among the eligible population [
46]. This success can be credited to effective mass communication and social engagement initiatives led by religious figures, De-Suup volunteers, and local leaders. These efforts were instrumental in dispelling myths and misinformation about the vaccine and motivating people to get vaccinated [
47].
The high rate of vaccination in Bhutan has not only significantly decreased the risk of severe illness, but it has also maintained COVID-19 morbidity at impressively low levels compared to neighboring countries in the region, such as India, Nepal, Bangladesh, and China. Throughout the entire pandemic, Bhutan has only reported 21 deaths attributed to SARS-CoV-2 [
41]. This highlights the effectiveness of Bhutan’s proactive and well-organized public health response in protecting the health and well-being of its citizens.