Deciphering the interaction of physio-chemical parameters of water and ARGs!

In this work, we build a machine learning model using 5 water quality parameters (Temp, pH, TDS, DO, TN), antibiotic resitance and antibiotic resistance genes (ARGs). The antibiotic resistance and ARGs are extracted from E. coli speciments which were taken sewage. LGBMClassifier is used to predict the ARGs while the results are explained using SHapley Additive exPlanations (SHAP).

Data

Our data is collected from 14 sewage water points located in Islamabad which is the ninth largest city of Pakistan. With 1.23 million inhabitants, the city comprised of eight zones: administrative zone, commercial district, educational sector, industrial sector, diplomatic enclave, residential areas, rural areas and green area, each with ethnically diverse populations. The city is home to the multiple universities, numerous busy hospitals, and pharmaceutical industries. Total number of samples are 466. We have four ARG targets which are TEM, OXA48, MCR-1 and CTX-M. There are three scenarios for modelling. In the first scenario, we are considering both water quality and antibiotics parameters to predict ARGs. The second and third scenarios encorporates only water quality and only antibiotics input parameters respectively. A comprehensive analysis of data is given in 1. Exploratory Data Analysis

Results

Prediction results can be found in 3. Models for each target. It can be seen that using both water quality (WQ) and antibiotic concentration parameters together resulted in a significant increase in performance, with a notabe improvement in accuracy and f1 score. While 4. Prediction Performance and 5. SHAP Plots has results for some more performace plots and SHAP feature importance plots respectively. The SHAP results show that the physiochemical properties play more significant role in boosting the model performance a compared to antibiotics.

Reproducibility

To replicate the experiments, you need to install all requirements given in requirements file . If your results are quite different from what are presented here, then make sure that you are using the exact versions of the libraries which were used at the time of running of these scripts. These versions are given printed at the start of each script. Download all the .py files in the scripts including utils.py (utils) file. The data is expected to be in the data folder under the scripts folder.