# Statistical Evaluation of Two Microbiological Diagnostic Methods of Pulmonary Tuberculosis After Implementation of a Directly Observed Treatment Short-course Program

## Article information

## Abstract

### Objectives:

To evaluate the diagnostic accuracy of smear and culture tests of clinical samples of pulmonary tuberculosis after the introduction of the directly observed treatment short-course (DOTS) program.

### Methods:

Using sputum samples from 572 individuals as a self-selected population, both Ziehl–Neelsen staining and culturing on Lowenstein–Jensen medium were carried out as diagnostic procedures. Using Bayes’ rule, the obtained data set was analyzed.

### Results:

Of the 572 samples, 33 (0.05769) were true positive (results of both tests positive) cases; 22 samples (0.03846) were false positive (smear test positive and culture test negative) cases; 62 samples (0.10839) were false negative (smear test negative and culture test positive) cases; and 455 samples (0.79545) were true negative (results of both tests negative) cases. Values of test statistics, sensitivity, and specificity were used to compute several inherent other Bayesian test statistics. The *a priori* probability or prevalence value of tuberculosis in the targeted population was 0.166. The *a posteriori* probability value computed arithmetically was 0.6614 and that obtained by the graphical method was 0.62.

### Conclusions:

The smear test was found to be dependable for 95.4% with stable TB infections, and it was not dependable for 34.7% without stable TB infections. The culture test could be regarded as the gold standard for 96.15% as seen with the data set, which was obtained after the implementation of the DOTS program.

**Keywords:**

*a priori*probability;

*a posteriori*probability; Bayes rule;

*Mycobacterium tuberculosis*; pulmonary tuberculosis; ROC curve; sensitivity; specificity

## 1. Introduction

Tuberculosis, caused by tubercle bacilli (TB), spreads by droplet infection, via an aerosol sneezed by a patient. In immune-compromised/aged individuals, those with poor health, those infected with HIV, and those living in poor hygienic conditions, TB attacks through the upper respiratory tract (URT), and after a dormant period, the disease is expressed [1]. Sometimes, a TB strain causes URT infections, and when consulting a clinician the infected individual is routinely advised to undergo a rapid diagnostic test—the acid-fast bacillus (AFB) staining or Ziehl–Neelsen staining (ZN staining or smear test, cited herein). Typically, this method requires a critical population of bacilli (5–10×10^{3} bacilli/mL) in the clinical (sputum) sample from an infected person for the positivity of the test result [2]. However, a smear test can turn up a negative result, if there is only a small number of bacilli in the sputum sample. Concomitant to diagnosis by a smear test, sputum samples are routinely sent for culturing in Lowenstein–Jensen (L–J) medium. Unfortunately, in TB it takes 3–4 weeks for colonies to develop, during which time the disease becomes stable in the infected individual. Indeed, this test (culture test, cited herein) is regarded as the gold standard, because viable bacilli in the sputum sample grow into colonies in the L–Jmedium [3].

The unfortunate situation is that *false negative* (FN) cases (results where the smear test is negative and the culture test is positive) lead to cryptic invasions of bacilli that progress toward the establishment of the disease, as a negative smear test result prompts the decision for nontreatment (to control the infection), and when culture test results become available later (after 1 month or so), the person has already contracted the disease [4]. Therefore, to prevent this, clinicians usually recommend for patients to undergo empirical TB chemotherapy. When a patient is on chemotherapy, he/she may sometimes have a sufficient amount of dead bacilli to yield “smear test positivity along with culture test negativity”, giving rise to *false positive* (FP) cases. The other two obvious possibilities are the positivity of both tests, i.e., *true positive* (TP) cases (smear test positivity and culture test positivity) and the negative results of both tests, i.e., *true negative* (TN) cases (smear test negativity and culture test negativity), which can be suitably taken care of by the clinician. The confusing ambivalence of FN and FP cases creates clinical ambiguity, i.e., persons with FN cases are not given chemotherapy unless multiple comorbidities are evident, leading to the establishment of the disease. By contrast, FP patients, particularly those with a small number of bacilli, are unnecessarily subjected to a rigorous regimen of chemotherapy. In addition, FP cases may arise from infection from mycobacteria other than tuberculosis (MOTT).

This hospital was converted into a teaching hospital in 2007, and the Revised National TB Control Program was launched in the same year; however, the program only became effective from December 2009 onward. The samples were collected from suspected patients, who at times had been treated with the directly observed treatment short-course (DOTS) protocol, which was instituted by the World Health Organization [5]. The DOTS strategy involves the treatment of TB patients for the first 2 months with the first-line drugs of chemotherapy [6]. There were also provisions for intermediate dosing of drugs three times weekly, and at times twice weekly, although this was not recommended by the World Health Organization, because margins of error stemming from accidentally omitting one dose per week may result in once-weekly dosing, which would virtually render the treatment ineffective. It had been recorded that the implementation of DOTS has a success rate exceeding 95% and that it prevented the emergence of further MDR-TB strains [7]. It should be noted that the DOTS-plus program meant for MDR-TB was not introduced in this study, because it involves drug sensitivity testing as a routine procedure; thus, patients were treated under resource-limited settings. Data presented here were from a period of 19 months, as recorded from patients in areas where the DOTS program was implemented. This work is an extension of our previous work of 5 years [8], which was conducted in a community where the DOTS program was not used. This report describes the prevalence of tuberculosis after the program has been implemented. Thus, this work substantiates, with a reasonable interval after the previous study, the use of the DOTS program in and around this TB center with a view toward examining its aftermath in a typical state of India.

### 1.1. Why Bayesian analysis?

In a population of suspected patients who donated sputum samples, four types of situations were noted. Obviously, a clinician would be eager to know numerically about the errors of each test, which gave rise to FN and FP cases. Therefore, degrees of fallibility of both tests need be assessed. To resolve the ambivalence—how specific and sensitive are these tests?—Bayesian analysis can be used [9]. As with any disease, for TB, an affirmative diagnostic procedure becomes essential in order to determine the presence/absence of a disease in a patient. Two types of false cases, FN and FP cases, arise as errors unbeknown to the clinician—the first type of error is the treatment of healthy people suspected of being infected, and the second is allowing infected patients to go untreated in a community of healthy individuals [10]. The first type of error results in morbidity linked to first-line and second-line drugs [6], whereas the impact of the second type is even more grave, in that the infection subtly spreads to the rest of the patient’s body as well as to the community. This situation could lead to serious consternation in issues of public health [6]. Thus, a suitable test to resolve the ambivalence of supporting smear tests or culture test results is essential.

Data could be presented in a 2 × 2 generic format, for which only the Bayesian concept can be used, as the culture test is considered the gold standard [11]. The data set could then be used to assess the prevalence of tuberculosis in the targeted population as a *post hoc* trial from 572 samples; in addition, the digital assessment of the credibility of the smear test could be performed. As the culture test also has its degree of fallibility, i.e., unviable bacilli as discussed above may give rise to positivity in the smear test and negativity in the culture test, its quantification also remains an obvious quest.

Diagnostic tests are used to reveal the occurrence of a disease in a population consisting of randomly distributed diseased and disease-free individuals, and the accuracy of a diagnostic test can be measured by comparing the test results to the true condition of patients individually. Herein, the ambivalence of smear and culture tests could be resolved with the account of data as evidenced by an appropriate statistical analysis involving probability—as the extent of how dependable each test is. Obviously, an ideally based truth is required with which the second test can be compared—the smear test is to be assessed. Therefore, with care, the Bayesian analysis based on evidence could measure the degree of belief/assumption: first, at what percentage can the culture test be taken as the gold standard, and second, to what extent, numerically, can the smear test be considered dependable for the start of TB chemotherapy?

To evaluate the inherent probability of each test, the prior probability (*a priori probability* or *prevalence* or the prevalence of disease in the targeted population) is determined before using the data. Prevalence is computed as (TP + FN)/*N*, where *N* is the total number of samples. Additionally, several test statistics are associated in the analysis:

The

*sensitivity*(*TP rate*) is the proportion of people with the disease who will have positive smear test results, computed by [TP/(TP + FN)]. This value is the ability of the smear test to detect the infection status, when it is truly present, i.e., it is the probability of a positive test result, given that the samples were taken from sick individuals.The

*specificity*(*TN rate*) is the proportion of people without the disease who will have negative smear test results, obtained by [TN/(FP + TN)]. This value is the ability of the smear test to yield a negative result with samples from disease-free individuals, i.e., it is the probability of a negative test result.The

*FP rate*is the probability of errors in the culture test, computed as [FP/(FP + TN)].The

*FN rate*is the probability of errors in the smear test, computed as FN/(TP + FN).The

*positive predictivity*is the*posttest probability*of the disease that yielded a positive test result, or the probability of the portion of people with positive test results who actually had the disease, computed as [TP/(TP + FP)].The

*negative predictivity*is the*posttest probability*of the disease that gave a negative test result, or the probability of the proportion of people with negative test results who actually did not have the disease, computed as [TN/(FN + TN)].The

*diagnostic accuracy*(inherent validity or predictive validity) is the ability of the smear test to be correctly positive or negative, among the binary results of the culture test, computed as [(TP + TN)/*N*]. Additionally, this value estimates the accuracy of smear and culture tests together.The

*positive likelihood ratio*(LR) is the ratio between the TP rate and the FP rate, computed as [sensitivity/(1 − specificity)], when the smear test result was positive.The

*negative LR*is the ratio between the FN rate and the TN rate, computed as [(1 − sensitivity)/specificity], when the smear test result was negative. In fact, the larger is the positive LR value, the greater the likelihood of infection, and similarly, the lesser is the negative LR value, the lesser the likelihood of infection in a population.The

*a posteriori probability*is the value from posttest arithmetic computation of the data set for the diagnostic efficiency, and it clarifies the dependability of each test independently, with a numerical probability value in arriving at the truth, i.e., the sought-after conclusions from both tests.The area under the receiver operating characteristic (ROC) curve, drawn with values of

*sensitivity*and 1 −*specificity*, gives a graphical analysis for diagnostic efficiency. The graphical method additionally examines the predictive capability as another value of*a posteriori*probability, independent of the arithmetic computation.

## 2. Materials and Methods

Sum Hospital, Bhubaneswar, is a philanthropic clinical teaching hospital with a recognized TB center. During the 18 months of study, persons of all age groups suspected of having pulmonary tuberculosis contributed fresh sputum samples, which were subjected to AFB/smear test and culturing in the L–J medium on the same day. An aliquot of 5 mL of a sample was added to a volume of 10 mL of 4% NaOH in a centrifuge tube that was placed for 15 minutes in a water bath at 37° C, for the digestion of mucus. The tube was centrifuged at 3000× *g* for 20 minutes; the supernatant was discarded, and the residue was washed three times with sterile distilled water [12]. A smear was prepared using two droplets of the suspension on a glass slide, and this was air dried; drops of 1% carbol-fuchsin were poured onto the smear. Next, the slide was heated gently and was allowed to stand for 10 minutes for the coloration of the smear. The slide was gently washed with water and was decolorized with drops of 25% H_{2}SO_{4}. The smear was further counterstained with 0.1% methylene blue solution for 1 minute, and was gently washed before air-drying. At least 200–300 fields under an oil immersion objective were screened for red/pink AFB, and results were recorded as 0–1, 1–9, or 10–99 or more AFB per field (Figure 1). Results were reported, viewing under 100 fields, as follows: (1) negative with no red/pink bacteria, (2) scanty for 1–9 bacilli, (3) + for 10–99 bacilli, (4) ++ for more than 100 bacilli, or (5) +++ for bacilli more than 100 per field [13]. Furthermore, duplicate tubes of the L–J medium were inoculated from the prepared suspension and were incubated at 37° C for the growth of colonies that were checked later, in 6–8 weeks with weekly intervals.

## 3. Results

Diagnostic analyses of 572 sputum samples (*N*=1.0) obtained in a period of 19 months (March 2010 to September 2012) were performed with a smear test and culture test, in a parallel manner. It was found that from a total of 572 samples (*N* = 1.0), 33 samples (0.05769) were TP cases; 22 samples (0.03846) were FP cases; 62 samples (0.10839) were FN cases; and 455 samples (0.79545) were TN cases. It was evident that there was mismatch of results in the two tests, so FN and FP cases arose (Table 1). Applying the Bayesian concept with the recorded data (Table 1), several other test statistics described earlier could be computed for additional probability values, with 95% confidence interval (CI) values (Table 2).

### 3.1. Computation of *a posteriori* probability mathematically and by ROC curve analysis

The *a posteriori* probability or *P*(*E*_{1}|*E* ), the probability value of a sample to be truly positive, can be calculated using the Bayesian formula,

*E*is the event that the smear test result is positive;

*E*

_{1}is the event that the result of the culture test involving the same sample is positive;

*E*

_{1}′ is the partition of the sample space for all clinical samples from noninfected individuals, and it is a hypothetical value. This yields several probability values:

Because we seek the mathematical value of *a posteriori* probability, substituting the above values in its formula, we obtain

The population of 572 samples was grouped into six fractional populations, and values of prevalence remained at the mean value of 0.23 ± 0.12 (the original prevalence value was 0.16608). Values of sensitivity, specificity, and *a posteriori* probability were determined before drawing the graph for the ROC curve, and these values gave an idea that for all possible values of population and prevalence, the sensitivity patterns changed with a mean present value of 0.30 ± 0.13 (the original sensitivity value was 0.347), but the specificity values remained unchanged at 0.99 throughout. Values of *a posteriori* probability also remained in the range at the mean value of 0.59 ± 0.05 (the original *a posteriori* probability value was 0.6614) (Table 3).

Values of both sensitivity and specificity were used to determine another value of *a posteriori* probability by the ROC curve (Figure 2), which was drawn by joining the cut-points represented by six values of each: sensitivity versus 1 − specificity; and the diagonal chance line, (45° line) through the coordinates (0, 0) and (1, 1), was drawn as the lower limit. The area of the upper triangle above the 45° diagonal line (called the chance line) was taken as the total value = 1.0, out of which the AUC (area under the ROC curve) was found to be 0.62 (95% CI, 0.473–0.767), determined by using the trapezoidal rule [14]. This means that the smear test has a 62% chance of correctly distinguishing a sample from an infected person and a sample from a noninfected individual. This is the second value of *a posteriori* probability, the first one being 0.6614 or 66.14%.

## 4. Discussion

The presence of the disease in this population of 572 suspected individuals or the prevalence or *a priori* probability value of the test was 0.16608 or 16.6%, computed according to Zhou et al [15]. There were 95 positives (TP and FN cases) out of the total sputum samples, based on culture test results. Moreover, from both types of false cases (FN and FP), it was clear that each test was insufficient for the prognosis. *Positive predictivity* is the conditional probability that a patient had the disease, given that the smear test result was positive. Similarly, *negative predictivity* is the conditional probability, where the sample does not have the infection, given that the smear test result was negative. The *positive predictivity* value, 0.6, and the *negative predictivity* value, 0.88, computed herein are far from the absolute values of 0.4 and 0.12, respectively. However, both are dependable in terms of determining the *prevalence* of the disease [16]. In other words, the *negative predictivity* value is dependable for the smear test.

Sensitivity and specificity are two important test statistics that are conditional to the situation of the stable TB infection with the sample donor, but both of these values are not affected by the prevalence of the disease. In this study, the sensitivity of the smear test was 0.347—a value that strongly undermines the effectivity of the smear test for TB diagnosis in the presence of stable TB infections. However, the specificity value was 0.954, which suggests an absolute dependability of the smear test in the absence of an infection. The correct rather cumulative value of these two test statistics would be [(1 − 0.347) + (1 − 0.954)] = 0.696. Therefore, the smear test was dependable for a correct prognosis of the disease in either way, absence/presence of the disease, by 69.6% only, with or without a stable infection. This cumulative value of 69.6% was not at par with the diagnostic accuracy value of 0.853 (85.3%), which signifies how commonly dependable the smear test is, in cases where the result of the culture test was still unknown. The difference between the above values (69.6% vs. 85.3%) could be attributable to the inveterate advice of a clinician for the smear test, when copious cough is present in the URT with other doubtful symptoms, as a preemptive practice, or the limited fallibility of the culture test. However, 455 (79.5%) TN cases from a total 572 are justified for the habitual advice by clinicians for the smear test, as noted.

The 62 (10.84%) FN cases of the total smear test result could be attributable to samples from a healthy person without any bacilli, from newly infected individuals with a paucity of organisms, and unviable infection with *Mycobacterium paratuberculosis*[17], including MOTT or prior Bacillus Calmette–Guérin vaccination, as noted [18]. A high value of FN cases (10.84%) should actually induce a progress in the infection that is present in the body, and it is matter of concern because TB chemotherapy has not been initiated in FN cases. Obviously, error in TB prognosis would cause an individual to become an outcast, because of drug-resistant infections, especially due to FN cases. Nevertheless, samples are concentrated before diagnostic steps are undertaken by default. Indeed, at least, 5–10 × 10^{3} bacilli/mL must be present in a sample for a smear test result to be considered positive [2,19]. Thus, the insufficiency of the smear test could be attributed to the small number of bacilli in the sample. The pragmatic approach to TB prognosis would definitely be the nucleic acid amplification test with isolation of DNA from bacilli, meant for drug-resistant bacilli, which is not usually followed in resource-limited settings. Thus, a smear test would be inadequate in distinguishing a sick from a nonsick person with latent TB, as the latter would promote evasive FN or FP cases. The dependability of the culture test is challenged by the 22 FP cases; in other words, this test is dependable as the gold standard for 96.15% only. Virtually, the probability of the culture test result being positive would never be zero, but the probability of the smear test to be totally negative cannot be ruled out, when each sample contains an insufficient amount of bacilli. Moreover, the FP cases are 22 (3.85%), which suggests that the erroneous smear test results may be attributable to a patient undergoing chemotherapy, leading to unviable bacilli for the culture test, but the smear test would be positive because of the presence of dead bacilli. Thus, the FP rate is 0.046 or 4.6%.

With double-checking (arithmetic and graphical), the posttest analysis of the data could be done for numerical assessments with two values of *a posteriori* probability. The graphical representation value is 0.62 and the arithmetic value is 0.6648. Both values are in close proximity with a distance of 0.4% in derivation. Thus, statistically this signifies the dependability of the smear test with this binocular vision. Moreover, the values of associated test statistics generated in the Bayesian analysis clump around the data set facilitate a multiple evaluation of the ambivalence. Thus, this analysis would provide a methodological framework of quantitative assessment of two test results of diagnosis of pulmonary tuberculosis.

The limitations of this analysis are numerous. First, an infected individual without any symptom of infection would have a positive smear test result. Second, both sensitivity and specificity are not affected by the prevalence of the disease, but they are well affected by the inherent fallibility of each test. For example, when the sensitivity value is higher, it would be easier to detect positivity in a population by the smear test; however, it could also be attributable to individuals with a more advanced stage of the disease [15]. Third, these two test statistics do not directly help in assessing the test results of individual patients as both are based on the data set of the population. Lastly, the habitual advice of clinicians to individuals with URT infections to undergo a smear test promotes ambiguous FN cases.

This Bayesian analysis on test results could represent an opportunity for the numerical assessment of two diagnostic methods by generation of a set of values of test statistics, which cumulatively qualify the smear test to be moderately dependable (69.6–85.3%), i.e., lesser dependability when the infection is present in the individual, and greater only when the infection is not present. The gold standard culture test was found to be almost exquisitely dependable for the prognosis of pulmonary tuberculosis, as known. The posttest or *post hoc* analysis of the data set generating two values of *a posteriori* probability, falling within 62.0% and 66.48%, however, neither advocates strongly for, nor undermines both diagnostic methods. It should be noted, however, the recent outbreak of multidrug-resistant TB worldwide must be controlled with more rigorous measures, for which both these methods are insufficient.

## Acknowledgements

S. Rath is a SRF supported by CSIR, New Delhi. Part of the project was financed by a UGC-MRP in Botany (grant no. 39-388/2010/SR) to R.N. Padhy, the CSIR Scientist of the institute. We are grateful Dr. D.K. Roy, Dean, IMS & Sum Hospital, Siksha ‘O’ Anusandhan University, Bhubaneswar. We are thankful to Somadatta Das, DEO, IMS & Sum Hospital, for the photomicrograph.

## References

*Mycobacterium tuberculosis*from patients’ smear-negative for acid-fast bacilli. Lancet 1999;2. 6. 353(9151):444–9. 9989714.

*M. tuberculosis*isolated from a group of patients referred to PMRC, TB Research Center at Mayo Hospital, Lahore, Pakistan. Pak J Zool 1998;30:335–9.