Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22751
Title: AN INTEGRATED APPROACH TOWARDS THE IDENTIFICATION OF NOVEL BIOMARKERS IN RESPIRATORY DISORDERS
Authors: TANWAR, NAKUL
Hasija, Yasha (SUPERVISOR)
Keywords: RESPIRATORY DISORDERS
NOVEL BIOMARKERS
INTEGRATED APPROACH
COPD
Issue Date: Nov-2025
Series/Report no.: TD-8657;
Abstract: Respiratory disorders such as COPD, ILD, CPFE, and lung cancer are primarily lung diseases, yet they do not operate within isolated physiological boundaries. These conditions share a deeply interconnected inflammatory landscape, where chronic immune activation, oxidative stress, epithelial injury, and aberrant tissue repair collectively drive both disease progression and coexistence. This interconnectedness is evident in clinical practice, where patients frequently present with overlapping respiratory conditions such as COPD coexisting with ILD or lung cancer because they are shaped by the same underlying molecular and inflammatory pathways. The presence of such overlap points to a broader biological principle that chronic inflammation exists along a continuum across the body rather than remaining confined to a single organ. As a result, it emerges as a systemic process capable of linking diseases that traditionally appear unrelated. This becomes clearer when considering how circulating inflammatory mediators, dysregulated immune cells, and miRNA-driven signaling can influence tissues beyond the lungs. Within this continuum, some immune-mediated conditions for example, Multiple Sclerosis (MS) further demonstrate how shared inflammatory and immune-regulatory disturbances can bridge organ systems, reinforcing the idea that complex diseases are often unified by common immunological mechanisms rather than separated by anatomical boundaries. Traditional diagnostic tools including imaging, pulmonary function tests, and histopathology frequently detect disease only at advanced stages. In parallel, although omics technologies have generated large-scale genomic and transcriptomic datasets, their clinical translation is hindered by the complexity of multi-omics signals and by the “black-box’’ nature of most machine learning approaches. The present research addresses these gaps by integrating multi-omics analysis, machine learning, explainable artificial intelligence (XAI), miRNA–mRNA regulatory network exploration, and large language model (LLM)-based interpretability to uncover shared biomarkers, elucidate mechanistic relationships across diseases, and develop an accessible, clinically interpretable decision-support system for lung disease indication. The first component of the study investigates coexistence among COPD, ILD, and CPFE through integrative transcriptomic and regulatory network analyses. Using the GSE47460 microarray dataset (582 lung tissue samples), rigorous preprocessing, quantile normalization, and classbalancing with SMOTE were applied, followed by a Random Forest classifier to distinguish COPD, ILD, and control samples. Explainable AI using SHAP revealed 20 key genes including OCIAD2, IRS2, TRIM2, MUC20, and CCDC109B—that consistently contributed to model performance across all classes. Functional enrichment analysis demonstrated that these genes participate in oxidative stress, immune activation, epithelial repair, extracellular matrix remodeling, and calcium signaling pathways, all of which underpin the shared pathogenesis of COPD, ILD, and CPFE. Subsequent validation via heatmaps, gene co-expression networks, singlecell expression analysis, and miRNA–mRNA regulatory mapping confirmed the biological relevance of these markers and identified their involvement in fibroblast activation, inflammatory fibroblast signatures, and altered epithelial homeostasis. Collectively, these findings provide strong evidence of convergent mechanisms underlying respiratory disease coexistence and highlight candidate biomarkers with diagnostic and therapeutic utility. The second component explores systemic inflammatory connections among COPD, lung cancer, and MS using the GSE61741 peripheral blood miRNA dataset (237 samples) along with an independent validation dataset. Machine learning models, supported by SMOTE-based class 6 | P a g e balancing and 5-fold cross-validation, achieved high predictive accuracy for all four classes. SHAP interpretability revealed 20 core miRNAs including hsa-let-7c, hsa-miR-223, hsa-miR-92a, and hsa-miR-454 that serve as central regulators across these diseases. These miRNAs converged on six shared inflammatory genes (IL6, IL10, CCL2, CCL5, MYC, and ITGB3), forming a crossdisease regulatory axis linking neuroinflammation, chronic respiratory inflammation, fibrosis, and oncogenesis. Downstream enrichment analyses identified common signaling pathways such as NF-κB, JAK-STAT, PI3K-Akt, cytokine–cytokine receptor interactions, and immune cell activation cascades. Single-cell expression mapping further demonstrated that these genes and miRNAs are enriched in inflammatory fibroblasts, macrophages, T cells, and epithelial populations, suggesting a shared pathological microenvironment across lung and neurological diseases. This objective provides a unified molecular explanation for the epidemiologically observed association between MS and COPD and for the heightened risk of lung cancer in COPD patients. It also identifies cross-disease miRNA signatures that hold promise as non-invasive biomarkers for early detection, risk stratification, and therapeutic targeting. The third component translates these findings into a practical, interactive clinical tool through the development of a SHAP–LLM powered chatbot for lung disease indication. Using a structured dataset of 5,000 individuals with 17 clinical and behavioral features, an XGBoost classifier with monotonic constraints was trained to ensure biologically consistent predictions. The model achieved high accuracy, cross-validation stability, and strong performance on independent validation sets. SHAP-based interpretations were integrated into a conversational interface powered by an LLM, enabling users to query risk predictions, feature contributions, and disease mechanisms in natural language. The system automatically contextualizes SHAP explanations, interprets biomarker relevance, and supports free-text clinical queries, thereby bridging the gap between computational prediction and clinician/patient comprehension. This represents a novel fusion of clinical feature–based risk prediction, XAI-driven transparency, and LLM-powered interpretability, enabling real-time, user-friendly insights from questionnaire and physiological data. By integrating SHAP explanations with a conversational interface, the system transforms conventional tabular risk scores into intuitive, clinically meaningful guidance, with potential applications in telemedicine, early screening, patient counseling, and front-line clinical decision support. Taken together, this thesis advances three major contributions: (i) the identification of shared multi-omics biomarkers and regulatory programs underlying the coexistence of COPD, ILD, CPFE, and related conditions; (ii) the discovery of cross-disease miRNA signatures and inflammatory axes connecting respiratory and neuroinflammatory disorders; and (iii) the development of an interpretable, LLM-augmented clinical decision-support system based on questionnaire-derived features rather than molecular biomarkers.. The findings offer a foundation for integrated biomarker panels for early diagnosis, unified therapeutic strategies targeting shared pathways, and AI-driven decision-support tools capable of enhancing clinical workflows. Future research can expand these models to include proteomics, metabolomics, longitudinal patient monitoring, and real-time wearable sensor integration. Further refinement of the chatbot into a clinically validated decision-support system may facilitate adoption in primary care and personalized respiratory healthcare. Ultimately, the study demonstrates how multi-omics analytics, explainable machine learning, and advanced language models can be combined to address longstanding challenges in understanding and managing complex respiratory disorders.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/22751
Appears in Collections:Ph.D. Bio Tech

Files in This Item:
File Description SizeFormat 
Nakul Tanwar pHJ.D..pdf5.84 MBAdobe PDFView/Open
Nakul Tanwar PLAG.pdf6.55 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.