DEEP LEARNING TECHNIQUES FOR  CATEGORIZING USER GENERATED TEXT ON THE INTERNET

MALHOTRA, ANSHU

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/20900

Title:	DEEP LEARNING TECHNIQUES FOR CATEGORIZING USER GENERATED TEXT ON THE INTERNET
Authors:	MALHOTRA, ANSHU
Keywords:	DEEP LEARNING TECHNIQUES GENERATED TEXT CATEGORIZING USER GENERATED CONTENT (UGC) INTERNET
Issue Date:	Jul-2024
Series/Report no.:	TD-7429;
Abstract:	The Internet of the present day, popularly known as Web 3.0, is phenomenally different from the Internet that was developed decades ago. It is no longer a one-way channel for information dissemination to the users. Today, the Internet sustains and thrives on the content provided by its users. Internet users are no longer just information consumers; rather, they are content or information producers as well. With the Internet becoming an indispensable part of our lives, today, people spend a significant amount of their time on the Internet, thereby creating a humungous amount of User Generated Content (UGC) as a by-product, e.g., product reviews, social media posts, etc. UGC content can be of a multimodal and multilingual nature. In the last decade, various research and real-world applications of UGC have been proposed and developed using Artificial Intelligence and Machine Learning, e.g., opinion mining, trend prediction, sentiment analysis, public health monitoring, etc. The objective of the research presented in this thesis is to study the applications of Deep Learning Techniques for Categorizing User Generated Text on the Internet. This research work presented in this thesis makes the following significant contributions. First, we have conducted an in-depth systematic literature review to understand the state-of the-art, highlight research gaps in existing work, and identify open challenges related to research applications of deep learning techniques for user generated content available on the Internet for various real-world social computing applications. Second, we have reviewed, compared, and empirically evaluated all popular supervised deep neural networks to benchmark their performance for a real-world application of user generated text categorization tasks. Third, the primary contribution of our research work is that we have proposed an explainable and interpretable system for supervised and unsupervised categorization of user generated text from the Internet by using the latest breakthrough techniques in deep learning for NLP domain, i.e., Transformer based LLMs. We have conducted extensive and in-depth experiments with six LLMs (BERT, DistilBERT, RoBERTa, MentalBERT, PsychBERT, PHSBERT) and four datasets. For explainability and interpretability (XAI) of predictions from the above deep learning models, we have used the two most recent techniques: LIME and SHAP. Next, we v have demonstrated the use of the Transformer-based unsupervised topic modeling technique BERTopic to analyze large-scale unlabeled UGC datasets for deriving insights. Fourth, we have performed Few Shot Learning and Active Learning experiments with pretrained LLMs, which can be beneficial for low resource research domains where good quality, large annotated UGC datasets are unavailable. For these scenarios, pre-trained LLMs can be trained with only a few good quality data samples annotated by experts using the above deep learning paradigms. Experiments were done with various LLMs for multiple datasets to analyze and compare their performance. We have demonstrated that it is possible to achieve high/comparable accuracy with even less than 10% of samples from the entire dataset. At last, we have conducted preliminary work to extend our research to categorizing multimodal user generated content on the Internet by exploring the use of recent innovative advancements in the field of deep learning for other modalities, i.e., images and videos. We have proposed a deep transfer learning framework for affective analysis of multimodal user generated content from the Internet. The review, analysis, empirical evaluations, and experimental results demonstrate the applications of proposed explainable deep learning techniques for social computing applications using text from the Internet. This thesis successfully helps advance the research related to the applications of deep learning techniques for categorizing user generated content from the Internet.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/20900
Appears in Collections:	Ph.D. Computer Engineering

Files in This Item:

File	Description	Size	Format
Anshu Malhotra Ph.D..pdf		7.68 MB	Adobe PDF	View/Open

Show full item record