Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/22162
Title: | HEART FUNCTIONALITY TEST USING GAN ALGORITHM |
Authors: | KHAN, HARIS SERAJ |
Keywords: | HEART FUNCTIONALITY TEST GAN ALGORITHM CARDIOVASCULAR DISEASES CTGAN |
Issue Date: | May-2025 |
Series/Report no.: | TD-8158; |
Abstract: | Cardiovascular diseases remain one of the top three causes of global mortality, and new tools to accurately and early diagnose cardiovascular disease are needed. When using traditional machine learning methods to predict heart disease using public datasets, like Cleveland Heart Disease (CDH) datasets, many studies reach a performance "ceiling." Such ceilings are often limited by the amount of data and associations among the biomedical properties examined. The following thesis aimed to determine if using Generative Adversarial Networks (GANs) to create synthetic data to augment the original dataset augment the predictive accuracy in heart disease predictive models. Two main types of GANs were analyzed, Conditional Tabular Generative Adversarial Network (CTGAN) and Medical Generative Adversarial Network (MedGAN). CTGANs were initially used to generate and augment synthetic tabular data from the original Cleveland dataset. Then, a Random Forest used CTGAN augmented dataset with feature engineering applied along with hyper-parameter optimization, achieved an accuracy rate of 90.16%. The MedGAN method was also developed to create synthetic medical records. The MedGAN method is a two-stage training process, involving a pre-training layer using an autoencoder type of model for representations of latent variables alongside an adversarial training to generate synthetic data. After generating synthetic data, the combined datasets of original and augmented records were used to train several classification methods (Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine (SVM), a Multilayer Perceptron Neural Network and XGBoost). The models that used the additional training data provided by MedGAN improved their accuracy, notably Gradient Boosting which achieved an accuracy of 91.8% and Random Forest which achieved an accuracy of 90.2%. Both MedGAN and CTGAN captured the primary variability of the data at a general level and the local structure well, but the authors noted that there was some clustering in the distributions of the continuous variables across the synthetic data. Overall, the results of this study provide strong evidence for the usefulness of augmenting small medical datasets through GAN based data augmentation strategies. The CTGAN and MedGAN methods were applied on the Cleveland Heart Disease dataset to improve the development of a predictive model. These models outperformed several other traditional methods. This paper provides supporting evidence for the use of advanced deep learning approaches (specifically GANs) to improve diagnostic accuracy in cardiovascular medicine and for other medical fields with small datasets. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/22162 |
Appears in Collections: | M.E./M.Tech. Electronics & Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
HARIS SERAJ KHAN M.Tech.pdf | 10.26 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.