Predictive Modeling of High Throughput Bioassay Screening Datasets using Machine Learning Algorithms

Arora, Sonam

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/14300

Title:	Predictive Modeling of High Throughput Bioassay Screening Datasets using Machine Learning Algorithms
Authors:	Arora, Sonam
Keywords:	COMPUTATIONAL MODELING
Series/Report no.:	TD-1305;
Abstract:	Dynamic and differential regulation and expression of genes form the basis of cellular identity and organisation. This dynamic regulation is majorly governed by the complex interactions of a subset of biomolecules in the cell operating at multiple levels starting from genome organisation, protein post-translational regulation and to the organellar level. The regulatory layer contributed by the epigenetic layer has been one of the favourite areas of interest recently that largely comprises of DNA modifications, histone modifications and noncoding RNA regulation and the interplay between each of these major components. Also the dysfunctional genes and proteins involved in mitochondrial dynamics are shown to be central to development of a number of disease processes and has been explored as a potential target for drug development. The availability of datasets of high-throughput screens for molecules for biological properties offer a new opportunity to develop computational methodologies which would enable in-silico screening of large molecular libraries in search of potential biological activities, as a substitute for costly chemical biology approaches. In the present study, we have used four different high throughput screens available for the inhibitors of epigenetic modifiers and one assay for mitochondrial fusion inhibitors. Computational predictive models were constructed based on the molecular descriptors generation owing to the activity of molecules. Machine learning algorithms for supervised training, Naive Bayes and Random Forest, were used to generate predictive models for the compounds available. Random forest, with the accuracy of 80%, was identified as the most accurate classifier.The study was also complemented with substructure search approach filtering out the probable pharmacophores from the active molecules leading to drug molecules. We show that effective use of appropriate computational algorithms could be used to learn molecular and structural correlates of biological activities of small molecules. The computational models developed were used to screen the large libraries of anticancer cell lines to show one of the application of these models generated.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/14300
Appears in Collections:	M.E./M.Tech. Bio Tech

Files in This Item:

File	Description	Size	Format
sonam arora dissertation.pdf		2.61 MB	Adobe PDF	View/Open

Show full item record