Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19474
Title: DESIGNING AN EFFICIENT PUBLIC HEALTH SURVEILLANCE SYSTEM USING MACHINE LEARNING
Authors: GUPTA, AAKANSHA
Keywords: PUBLIC HEALTH SURVEILLANCE
MACHINELEARNING TECHNIQUE
PAN-LDA
NLP
LSTM
Issue Date: Jul-2022
Series/Report no.: TD-6054;
Abstract: Public Health Surveillance (PHS) is considered to be an essential public health function. The primary functions of a public health system are health surveillance, population health assessment, disease and injury prevention, health protection, and health promotion. Surveillance is defined as “the close and continuous monitoring of one or more people for the purpose of direction, supervision, or control”. The World Health Organization (WHO) defines public health surveillance as “the ongoing, systematic collection, analysis, and interpretation of health-related data essential to the planning, implementation, and evaluation of public health practice”. Public health surveillance is regarded as the most effective tool for preventing epidemics. Health Surveillance can be used to track chronic diseases, infectious diseases, injuries, healthcare utilization, environmental concerns, and vector dispersal. Surveillance data are essential for influencing policy decisions, leading new program activities, refining public communications, and aiding agencies in evaluating research investments. Addressing case under-ascertainment is important in most surveillance systems, especially in pandemics of novel diseases with a wide range of clinical presentations, because it might impact public risk perception and policy implementation time. However, surveillance is never perfect, and diseases with a high proportion of mild, pauci-symptomatic, or subclinical cases can be difficult to identify and contain in most indicator-based monitoring systems. Effective public health surveillance systems can provide timely and reliable information allowing for the early detection of potential epidemics. A systematic approach is required to strengthen public health surveillance systems that can quickly detect and respond to the initial cases of disease outbreaks and other public health emergencies. One of the primary purposes of public health surveillance is to monitor diseases and trends of public health events to ensure that any atypical disease patterns, such as outbreaks, are timely discovered, examined, and responded to. Incentives are in place to encourage the development of public health surveillance systems, and employing machine learning technologies in public health events can help public health professionals speed up the process of monitoring, evaluating, and decision-making. With the increase in the use of the internet, the digital world is generating data at an alarming and continuing rate. One approach is to leverage the online health mentions posted during an ongoing vi public health event that generates unprecedented amounts of health-related data and couple it with the modern Machine Learning techniques for decision support. As appealing as it may sound, incorporating online data is associated with data-science challenges that limit the effective learning of ML models. The primary focus of this thesis is to incorporate public health related social media data along with historical data to improve prediction performance. The proposed models are empirically evaluated in the context of predicting health events such as novel COVID-19 disease cases. Overall, the research work is primarily useful for tracking and forecasting an ongoing outbreak, and it can give valuable advice to disease makers and epidemiologists. As a result, they will be able to implement appropriate policies to prevent and manage the epidemic. Therefore, this study represents an organized, systematic, and arranged effort that determines the identification, power, and applicability of public health surveillance using machine learning techniques. Objectives: The objectives of the entire study have been classified into four segments:  The first objective of the study is to enhance the prediction performance of the epidemiological models based on Machine Learning (ML) techniques.  The second objective focuses on analyzing the impact of the health determinant factors such as demographic data, environmental data, etc., and their significance for public health surveillance.  The third objective is to improve the feature extraction to enhance the performance of ML algorithms.  The last objective is to explore the applicability of simultaneously using multiple online platforms to improve prediction accuracy. Methodology: For achieving the mentioned objectives, this study utilizes machine learning and deep learning techniques like evolutionary algorithms, Neural Networks (NN), Natural Language Processing (NLP), and Topic Modeling approaches due to the tremendous applicability to solving the natural world problems. The following strategies are used to achieve the targeted objectives:  For achieving the first objective, innovative and novel models based on mathematical statistics, machine learning, and deep learning have been used to predict epidemic time vii series. The NLP techniques, Pandemic-Latent Dirichlet allocation (PAN-LDA) based Long Short-Term Memory (LSTM) neural network, and evolutionary algorithm were used to improve the prediction accuracy.  To accomplish the second objective, we analyzed the pandemic-related multi-source data and their impact on disseminating the pandemic. Later, we presented a prediction model that incorporates the health determinants data with the historical cases to predict the outbreak.  To attain the third objective, we proposed two feature extraction algorithms. One algorithm employed semantically and morphologically similar word embedding clusters as features to improve the clustering performance. And the other algorithm used Natural Language Processing and Topic Modeling approaches by incorporating historical cases and the corresponding news articles to extract better features for the time-series prediction.  The last objective explored the relevance and applicability of multi-source internet data to enhance prediction performance. With the increase in the use of online platforms, there is a tremendous increase in the data posted about the ongoing events. This large amount of data from multiple social media platforms can improve the performance of epidemic models. Results: The outcomes of the study are as follows:  An evolutionary algorithm and LSTM-based epidemic model is proposed to perform epidemic trend predictions.  A study is conducted to geospatially analyze the demographic, health, socio-economic, and climatic factors associated with the pandemic distribution.  A fixed-effect multiple regression prediction model is proposed to predict the daily confirmed cases during the early phases of the COVID-19 second wave and determine the possibility of upcoming waves.  A document representation method is proposed based on semantically and morphologically similar feature clusters to enhance the clustering performance. A Latent Dirichlet Allocation (LDA) based PAN-LDA feature extraction model is developed that makes use of the historical cases and the corresponding news articles to extract better features for the prediction.  A study was performed to analyze the trends in social media-based public health surveillance systems using ML algorithms.  An epidemic model is proposed that incorporates the features from data collected from multiple online sources such as Twitter, Reddit, and Google news to improve the prediction performance.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19474
Appears in Collections:Ph.D. Computer Engineering

Files in This Item:
File Description SizeFormat 
AAKANSHA GUPTA Ph.D..pdf4.74 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.