Abstract

Bacterial concentration is an important indicator to measure the degree of water pollution. Realizing rapid and accurate quantification of bacterial concentration in water is of great significance for ensuring water safety and maintaining human health. This paper proposes a method for rapid determination of bacterial concentration by multiwavelength transmission spectroscopy combined with partial least squares regression. Escherichia coli (E. coli) is selected because it is a common indicator microorganism for assessing water pollution status, and it is easy to handle. First, we measure the transmission spectra for E. coli suspensions in the region from 200 to 900 nm and analyze the differences in the spectral characteristics at different concentrations; subsequently, considering that the concentration is affected by the instrument linearity and other factors, the sensitivity, correlation, and detection ability of the spectra at different wavelengths with the change of concentration are analyzed, and the optimal characteristic band is selected according to its wavelength variation characteristics; finally, the determination of E. coli concentrations are completed by using the optimal characteristic band spectra combined with partial least squares regression. We calculate the bacterial concentration, compared with the plate counting, the maximum relative error is 4.500%, the average relative error is 0.677%, respectively, which is less than 5%, and their accuracy and stability are all better than those calculated by the single-wavelength method. This study provides a reference for the rapid and accurate detection of bacterial concentration in water.

1. Introduction

Bacterial microbes, as the primary pollutants in water, play a crucial role in assessing water quality and safety. Consumption of water contaminated with excessive bacteria concentration can lead to various infectious diseases, such as hepatitis, influenza, SARS, pneumonia, gastric ulcers, and respiratory illnesses [1, 2]. E. coli is recognized as one of the principle contributors to water contamination. Therefore, rapid and accurate detection of bacterial concentration in water provides valuable insights for effective prevention and control of water pollution.

The methods for measuring bacteria concentration in water include the standard plate counting [3], fluorescence microscopy [4], and digital image analysis. Although these methods yield relatively accurate results but will require complex sample preparation, is time-consuming, and involve expensive biological reagents, making them unsuitable for real-time and on-site detection. To achieve rapid and automated detection of bacteria concentration in water, relevant scholars home and abroad have explored spectrophotometry [5, 6] and flow cytometry [7, 8]. Spectrophotometry determines bacterial concentration based on the absorbance at a specific wavelength, which is a fast and simple approach; however, the calculated bacteria concentration is influenced by the absorption capacity of the selected wavelength, leading to high detection limits or low accuracy. Flow cytometry quantitatively analyzes bacteria concentration based on the scattering spectra generated from different angles of measurement and fluorescence spectra of exogenous labeling. This method provides advantages such as high measurement speed and accuracy but requires the costs for instrumentation, as well as trained personnel for operation.

Multiwavelength transmission spectroscopy is a novel technique for rapidly determining bacteria concentration in water. It provides rich spectral information, enhancing the accuracy of calculating results and reducing the detection limit for measurable bacteria concentration. However, directly employing full-spectrum data for quantitative analysis poses challenges due to high computational requirements, slow processing speeds, and susceptibility to noise interference in certain spectral regions. Therefore, selecting effective spectral bands for bacterial microbes, the quantitative modeling is established between the concentration and the band spectral data, to achieve rapid and accurate measurement of bacterial concentration.

Currently, there are several linear quantitative modeling algorithms for spectroscopy, including multiple linear regression (MLR) [9], principal component regression (PCR) [10], and partial least squares regression (PLSR) [11]. In spectroscopic analysis, MLR suffers from noise interference and limitations in the number of spectral bands, requiring manual selection of modeling bands based on empirical knowledge or repeated trials, resulting in substantial workload. PCR compresses the full spectrum information and selects a small number of independent bands to establish a regression model but in failing to consider the correlation between the extracted principal components and the target substance to be measured. PLSR compresses the full spectrum information and selects principal components that are highly correlated with the target substance, which is projected in the direction of the target concentration to be measured. Although PLSR’s method is sensitive to the presence of anomalous data [12], if no abnormal data are found in measured samples, it can be effectively used to extract quantitative information of the sample. Therefore, it is the most widely used and effective method in current spectroscopic linear quantitative modeling.

In this study, multiwavelength transmission spectroscopy combined with PLSR is employed to detect E. coli concentration in water. First, the spectral characteristics of E. coli suspensions at different concentrations are measured and analyzed. Then, we calculate the sensitivity, correlation, and detection limit of bacterial spectral variations with concentration under different wavelengths, to select the optimal spectral bands. Finally, through the PLSR algorithm, calculate the bacterial concentration and analyze the feasibility and accuracy of this method.

2. Materials and Methods

2.1. Theoretical Basis

Multiwavelength transmission spectroscopy, when light of different wavelengths passes through water containing bacterial cells, because of the absorption and scattering of bacteria, the transmitted light intensity through the medium is attenuated and its attenuation degree is related to the concentration and bacteria size. Therefore, by analyzing multiwavelength transmission spectra, bacterial concentration can be determined.

Assuming that light scattering of each measured bacterial cell satisfies the condition of unrelated single scattering, according to Mie scattering theory, the optical density of the bacterial cell population at a given wavelength can be expressed as follows [13]:where represents the particle size distribution of bacteria, is the path length, is the number of cells per unit volume, represents the total extinction coefficient, which is a function of the incident light wavelength λ, the bacteria diameter D, and the relative refractive index m (λ). In order to better characterize the relationship between the measured spectral values and bacterial concentration, the bacterial suspension is assumed to consist of monodisperse cells. The transmission spectrum of bacterial microorganisms can be expressed as follows:where represents the average equivalent particle size of bacteria. As indicated by equation (2), the measured optical density at different wavelengths contains information about bacteria size and concentration. Furthermore, there exists a linear relationship between the measured optical density and bacterial concentration. These observations provide a foundation for establishing a quantitative model between multiwavelength transmission spectroscopy and the target bacteria concentration.

2.2. Bacterial Sample Preparation and Transmission Spectral Measurement

E. coli(CICC #10389) was obtained from the China Center of Industrial Culture Collection (CICC), stored in nutrient agar at 4°C and transferred monthly. The bacterial suspension was made by activating bacteria, culturing at 37°C in beef extract peptone medium (PH7.0) containing 0.3% beef extract, 1% peptone, 0.5% NaCl, and 2% agar, expanding propagating in solution culture, centrifuging in the centrifuge (H-1650, Jiangdong instrument), and washing in sterilized deionized water [14].

Bacterial suspension was divided into two parts. One part was used for bacterial counting, and the plate counting method was employed to determine the bacterial concentration [3], the other part was used for spectral measurements. A total of 57 sets of E. coli suspensions with different concentrations were prepared with deionized water. E. coli suspensions were recorded using a UV-Vis spectrophotometer (UV2550) in the range of 200–900 nm and a step size of 1 nm. The measurements were performed in 1 cm path-length quartz cuvettes at room temperature, and deionized water was used as a reference. E. coli is rod-shaped with approximately 2.0–6.0 µm in length and 1.1–1.5 µm in width. The interparticle distance greater than three times the particle diameter is the condition to ensure independent scattering [13]. In order to guarantee independent scattering, the particle concentration in the medium should not exceed the values provided in Table 1.

According to the data in Table 1, the maximum particle concentration at different particle sizes can be calculated with use cubic Hermite function, when D = 6.0 µm and Np = 1.162 × 109 cells/ml. In order to not give rise to the multiple scattering problem, under the assumption that E. coli diameter is 6.0 µm (the maximum length of E. coli), we have chosen a conservative 10 × 108 cells/ml as our upper limit to provide enough separation of the cells in water. In addition, the cell suspensions were absorbed back and forth for several times by a clean pipette before the measurement, and then the spectra were recorded with the averages of 3 replicate measurements at each concentration. 57 measured spectra are shown in Figure 1 with the wavelength on the x-axis and the optical density on the y-axis.

3. Results and Discussion

3.1. Acquisition of Multiwavelength Transmission Spectra of Escherichia coli

To demonstrate the applicability of multiwavelength transmission spectroscopy in quantitative concentration monitoring, 9 transmission spectra of E. coli suspensions (1.3920 × , 6.960 × , 3.480 × , 1.740 × , 8.70 × , 4.35 ×  and 2.18 × , 1.09 × , and 0 ×  cells/ml) are selected as the concentration gradually decreased until it reached zero (Figure 2). At low concentrations, E. coli spectra overlap, making it difficult to distinguish the differences between low-concentration bacterial suspensions and pure water. For clarity, their spectra are amplified, as shown in the inset plot of Figure 2.

It can be observed that there are significant differences in the spectra of E. coli suspensions with different concentrations, the transmission spectra intensity increases versus E. coli concentration increases. According to the similarity of spectral patterns, the concentrations can be divided into three groups: low concentration (1.09 × , 2.18 × ), medium concentration (4.35 × , 8.70 × , 1.740 × , 3.480 × , 6.960 ×  cells/ml), and high concentration (1.3920 ×  cells/ml). This indicates that bacterial concentration influences the spectral patterns. The main reason for this phenomenon is that E. coli suspension at low concentration, the scattering effect of bacterial cells on light is weak, while the internal chromophores (proteins, nucleic acids, etc.) exhibit strong light absorption. The spectra primarily reflect the light absorption characteristics of E. coli cells. As the concentration increases, the scattering effect of E. coli cells on light becomes stronger, and the spectra predominantly demonstrate the cells scattering characteristics, overshadowing the absorption characteristics of intracellular chromophores.

3.2. Analysis of Variations in Spectra for E. coli Suspensions at Different Concentrations

From the full spectrum in Figure 2, it can be observed that the spectra of bacterial suspensions exhibit higher optical density in the 200–230 nm wavelength range than in the 230–900 nm wavelength range. This is attributed to the joint contributions of the scattering effect of E. coli cells, as well as the absorption effect of its internal chemical components (the twenty amino acids and peptide bonds that constitute proteins). In the 230–320 nm wavelength range, the spectral characteristics of E. coli suspensions vary significantly among different concentrations. Spectral absorption peak is concentrated on this range. However, there are distinct differences in the location of absorption peaks among different concentrations, as indicated in Table 2. Note that as the increase of E. coli concentration, the spectral absorption peak moves towards the shortwave direction, resulting in a “blue shift” phenomenon, which is generally attributed to the absorption of nucleic acid and certain aromatic amino acids (such as tryptophan, tyrosine, and phenylalanine) that constitute proteins [15, 16]. With the change of concentration, the leading role of aromatic amino acids on light absorption changed. When the concentration is low, the absorption effect of tryptophan may overshadow the absorption effects of other amino acids. As the concentration increases, other amino acids (such as tyrosine and phenylalanine) begin to play a significant role in light absorption. Consequently, this “blue shift” phenomenon leads to distortion in the relationship between spectral data and bacterial concentration, which is not suitable for the measurement of E. coli concentration.

In the wavelength range of 320–900 nm, the spectra for E. coli suspensions at different concentrations exhibit remarkable similarity. As the concentration decreases, the optical density values decrease, indicating a strong correlation between spectral data and concentration. However, when the concentration reaches 1.09 ×  cells/ml, the spectrum of E. coli overlaps with the spectrum of pure water for wavelengths greater than 480 nm. This phenomenon may be attributed to the weakened scattering and absorption capabilities of E. coli cells at lower concentrations, resulting in the measured optical density values that approach the instrument detection limit. In the wavelength range of 480–900 nm, the spectra with concentrations below 1 × 106 cells/ml is indistinguishable. Therefore, utilizing multiwavelength spectroscopy for E. coli concentration quantification, it is crucial to consider the sensitivity and detection limit of different wavelength spectra.

3.3. Sensitivity of E. coli Spectra to Concentration Variations

To accurately quantify concentration, it is necessary to analyze the sensitivity of the optical density measurements at various wavelengths to concentration changes. According to equation (2), the optical density measurements are proportional to the concentration. Based on the aforementioned wavelength divisions, the relationship between optical density and concentration is depicted for boundary wavelengths (200, 230, 320, and 900 nm) and a commonly used wavelength (600 nm), as shown in Figure 3. Utilizing the least squares fitting algorithm, the spectra of E. coli concentrations (1.3920 × , 6.960 × , 3.480 × , 1.740 × , and 8.70 × ) are selected. Linear fitting is performed for the five data points at each wavelength to obtain the optical density-concentration linear correlation function:where the parameter m represents the slope of the curve, indicating the sensitivity of the individual spectroscopic line to changes in the concentration. Table 3 provides the optimal fitting slopes and linear correlation coefficients for the five spectral features plotted in Figure 3. It can be observed that the slopes at 200 nm and 230 nm are larger than those at 320 nm, 600 nm, and 900 nm, indicating that the optical density values at 200 nm and 230 nm are more sensitive to changes in E. coli concentration.

Figure 4 illustrates the slope spectrum of E. coli suspension across the entire wavelength range. Note that in the region from 200 to 320 nm, the slope spectrum exhibits a pattern of sharp decrease, followed by a gradual increase, and then a subsequent decrease. This behavior is attributed to the strong absorption characteristics of the bacterial chromophores in response to incident light within this wavelength range. In the 320–900 nm range, the slope values decrease with increasing wavelength. From the point of view of the entire spectrum, the slope maximum is observed at 200 nm (1.6188 × 10−8), while the slope minimum is observed at 900 nm (0.8775 × 10−9). According to equation (2), the slope spectrum reflects the absorption and scattering capabilities of E. coli cells towards incident light at various wavelengths.

Knowing the slope of the best fitting curve at each wavelength point and the detection limits of E. coli concentration at different wavelengths can be calculated by the following equation [17, 18]: is the detection limit of the tested bacterial concentration; K is set to 3; represents the standard deviation of the deionized water spectrum obtained from 10 measurements; and is the slope of the calibration curve for the tested bacteria.

Figure 5 illustrates the UV-visible transmission standard deviation spectra of deionized water for 10 consecutive measurements. By applying equation (4), the detection limit curve for E. coli concentration at different wavelengths is shown in Figure 6(a). It can be observed that there are some outliers of zero in the calculated detection limit. Therefore, these outliers need to be removed; the result of eliminating outliers is presented in Figure 6(b). For ease of analysis, the detection limit spectrum with the outliers removed is processed with 10-point data smoothing, as shown in Figure 6(c). It is evident that, with increasing wavelength, apart from slight fluctuations in localized spectral regions, the detection limit for measuring E. coli concentration gradually increases. Among them, the detection limit minimum (1.266 × 105 cells/ml) is observed at 277 nm, while the detection limit maximum (1.858 × 106 cells/ml) is observed at 897 nm. The detection limit of bacterial concentration is wavelength-dependent; selecting appropriate wavelengths can reduce the detectable limit of bacterial concentration. Additionally, when conducting quantitative analysis of low-concentration bacterial microorganisms, it is necessary to exceed the lower limit of microbial concentration determination.

3.4. Correlation Characteristics between Spectra and Concentration

To achieve accurate quantification of bacterial concentration, it is not sufficient to only consider the sensitivity of spectral lines at different wavelengths to changes in concentration. It is also necessary to determine the correlation coefficient between the optical density () and concentration (). The correlation coefficient is given by the following equation [19]:where is the covariance between and , is the variance of concentration, and is the variance of the optical density.

The correlation coefficient (R2) ranges from 0 to 1, where 0 indicates no linear correlation and 1 indicates perfect linear correlation. The calculated correlation coefficient spectrum for E. coli suspensions in the region from 200 to 900 nm is shown in Figure 7.

As indicated in Figure 7, the correlation coefficient varies significantly within the 200–320 nm wavelength range. The reason is some chemical components (nucleic acids, peptide bond, and certain aromatic amino acids in composed of proteins) within the cells have strong absorption in the wavelength region (200–320 nm), there are differences in the spectra of bacterial suspensions with different concentrations in this band, such as blue shift occurred in the spectral absorption peak (see Table 2), which affect the correlation between optical density and concentration. In the 320–550 nm wavelength range, the correlation coefficient increases with wavelength and ranges from 0.9988 to 0.9999. This indicates a strong linear relationship between optical density and concentration, and the sensitivity of the spectrum to concentration is also moderate. Within the 550–900 nm wavelength range, there is significant noise in the correlation coefficient spectra, and the slope values remain in a lower range (8.775  10−10∼2.176  10−9), indicating low sensitivity, which is not suitable for the measurement of E. coli concentration.

Obviously, achieving accurate quantification of bacterial concentration typically requires satisfying two requirements. A high slope value ensures sensitivity of the spectral line to the changes in concentration is selected, while a high correlation coefficient ensures high quantitative accuracy. In the region from 320 to 550 nm, setting the correlation coefficient threshold to 0.9998, the optimal wavelength range that meets these requirements is approximately 388–550 nm, consisting of approximately 162 wavelength points. In the spectral regions 388–550 nm, a minimum R2 value is 0.9998, the values of slope m remain in a moderate range (2.176 × 10−9-3.922 × 10−9), greater than the slope values in the region from 550 to 900 nm. In addition, the chemical components in cells have no absorption not to interfere with the linear correlation between optical density and concentration in this wavelength region, and the R2 spectrum is low noisy. Taken together, the spectral data of this band are considered the best for quantifying bacterial concentration.

3.5. Calculation of E. coli Concentration Using Partial Least Squares Regression

Here, the optimal wavelength range of 388–550 nm is selected to perform the bacterial concentration measurement. Considering that spectrum contains a significant amount of redundant information, the correlation between adjacent wavelength points is higher than that between distant wavelengths, and more number of wavelengths would result in longer data processing times. Therefore, the partial least squares regression (PLSR) method is utilized to compress the spectra in this region (388–550 nm). A quantitative regression model is established by obtaining the spectral data of a few independent wavelengths.

To validate the superiority of the proposed method, 49 spectra in the wavelength range from 388 to 550 nm and their corresponding bacterial concentrations are selected. A mathematical model was established using the PLSR algorithm to describe the relationship between the spectra and E. coli concentrations. Based on this model, 8 concentrations of three different orders of magnitude (108, 107, and 106) are predicted. In addition, a standard curve is constructed based on the relationship between optical density at a single wavelength (600 nm) and concentration. The concentrations are calculated using this standard curve. Two spectroscopic methods and standard plate counting are compared to determine E. coli concentration in water. The results are showed in Table 4.

As indicated in Table 4, compared to plate counting, the concentrations calculated by the proposed method, the maximum relative error is 4.500%, the average relative error is 0.677%, both falling below 5%. In contrast, the concentration calculated by single-wavelength spectroscopy, the maximum relative error is 36.958% and the average relative error is 13.355%. This indicates that calculating bacterial concentration, the accuracy, and stability of multiwavelength spectroscopy are much better than results of the single-wavelength method.

4. Conclusions

In this study, a rapid detection method for E. coli concentration in water is proposed, utilizing multiwavelength transmission spectroscopy combined with partial least squares regression. The results show that, compared to plate counting, E. coli concentrations estimated by the multiwavelength transmission spectra outperforms the single-wavelength method in terms of accuracy and stability, the average relative error and the maximum relative error are both less than 5%. This method offers advantages such as short detection time, simplicity in operation, and accurate results, providing a new approach for the detection of bacterial concentration in water.

Data Availability

The relevant experimental data can be accessed through the following link: https://pan.baidu.com/s/1yz-GvRQmS2xOEmC5Dz4KaA?pwd=nw8t, password: nw8t.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Yuxia Hu and Dun Hu conceived and designed the research. Ruixiang Zhang conducted the experiments. Liye Guo collected the data. Dun Hu analyzed the data. Yuxia Hu wrote the manuscript. All authors contributed to the article and approved the submitted version.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 62105002) and the Key Research Project of Natural Science in Anhui Province (Grant nos. KJ2020A0471 and KJ2021A0974).