Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/21257
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPANDEY, ANANYA-
dc.date.accessioned2024-12-13T05:12:49Z-
dc.date.available2024-12-13T05:12:49Z-
dc.date.issued2024-08-
dc.identifier.urihttp://dspace.dtu.ac.in:8080/jspui/handle/repository/21257-
dc.description.abstractSentiment analysis is a computational technique that analyses the subjective information conveyed within a given expression. This encompasses appraisals, opinions, attitudes or emotions towards a particular subject, individual, or entity. Conventional sentiment analysis solely considers the text modality and derives sentiment by identifying the semantic relationship between words within a sentence. Despite this, certain expressions, such as exaggeration, sarcasm and humour, pose a challenge for automated detection when conveyed only through text. Multimodal sentiment analysis incorporates various forms of data, such as visual and acoustic cues, in addition to text. By utilising fusion analysis, this approach can more precisely determine the implied sentiment polarity, which includes positive, neutral, and negative sentiments. Thus, the recent advancements in deep learning have boosted the domain of multimodal sentiment analysis to new heights. The research community has also shown significant interest in this topic due to its potential for both practical application and educational research. In light of this fact, this research aims to present a thorough analysis of recent ground-breaking research studies conducted in the field of sentiment analysis using diverse modalities. Furthermore, this thesis dives into a discussion of the multiple categories of multimodal data, diverse domains in which multimodal sentiment analysis can be applied, challenges associated with multimodal sentiment analysis, and suggests different frameworks for analysing sentiments using visual-caption pairs and videos. The ultimate goal of this investigation is to indicate the success of deep learning architectures in tackling the complexities associated with multimodal data analysis. People are becoming accustomed to posting images, captions and audios on social media platforms to express their opinions. For our subsequent strategy, we conducted a comprehensive assessment and examination of the performance of several multimodal sentiment analysis models across a range of modalities. However, most recent multimodal strategies concatenate features from the visual, caption & audio modalities with the help of pre trained deep learning models containing millions of trainable parameters without adding a dedicated attention module, ultimately leading to less desirable results. Motivated by this observation, we have proposed a novel model VABDC-Net, VyAnG-Net that integrates the attention module with the conventional state-of-the-art models to extract the most relevant contextual information from these diverse modalities. The experimental results show that our suggested approaches can generate ground-breaking outcomes when applied to publicly available multimodal datasets, specifically Twitter-2015, Twitter-2017, MUStARD, and vi MUStARD++. The experimental results demonstrate that the proposed model attains much superior accuracy scores on all of these datasets and exhibits much higher efficiency compared to conventional approaches in predicting sentiment in multimodal data. Also, this investigation aims to employ Target-Dependent Multimodal Sentiment Analysis to identify the level of sentiment associated with every target (aspect) stated within a multimodal post consisting of a visual-caption pair. Despite the recent advancements in multimodal sentiment recognition, there has been a lack of explicit incorporation of emotional clues from the visual modality. The challenge at hand is to proficiently obtain visual and emotional clues and subsequently synchronize them with the textual content. In light of this fact, this thesis also presents a novel approach called the Visual-to-Emotional-Caption Translation Network (VECT-Net) technique to effectively acquire visual sentiment clues by analyzing facial expressions. Additionally, it effectively aligns and blends the obtained emotional clues with the target attribute of the caption mode. Additionally, a novel contrastive learning-based multimodal architecture have been introduced to predict emoticons using the Multimodal-Twitter Emoticon dataset acquired from Twitter. This proposed model employs the joint training of dual-branch encoder along with the contrastive learning to accurately map text and images into a common latent space. Our key finding is that by integrating the principle of contrastive learning with that of the other two branches yields superior results. The experimental results demonstrate that our suggested methodology surpasses existing multimodal approaches in terms of accuracy and robustness. In conclusion, this thesis presents substantial discoveries and identifies potential areas for future research on the subject of sentiment analysis utilizing multi-modal data.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTD-7643;-
dc.subjectSENTIMENT ANALYSISen_US
dc.subjectDEEP LEARNINGen_US
dc.subjectFRAMEWORKen_US
dc.subjectVECT-NeTen_US
dc.titleDESIGN OF FRAMEWORK FOR SENTIMENT ANALYSIS USING DEEP LEARNINGen_US
dc.typeThesisen_US
Appears in Collections:Ph.D. Information Technology

Files in This Item:
File Description SizeFormat 
ANANYA PANDEY Ph.D..pdf3.81 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.