Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18143
Title: SENTIMENT ANALYSIS WITH ML TECHNIQUES : HANDLING IMBALANCED DATASET
Authors: RAJ, VASUNDHARA
Keywords: SENTIMENT ANALYSIS
ML TECHNIQUES
IMBALANCED DATASET
Issue Date: Jun-2020
Series/Report no.: TD-4986;
Abstract: Sentiment Analysis is a process of analyzing and categorizing the emotion or sentiment over any given review or text piece in order to know what the reviewer wants to express in the form of positive, negative or neutral. Today, people are highly interested in buying things online from any e-commerce site or they search for a product review in order to know the quality and one’s perception toward that product before buying. Same goes for one wanting to download an App and would definitely view the reviews laid on those review section to know about that App. They would go with the app with highest rating or downloads or one with the good reviews depending on the person’s interest. The product/App provider also gets to know about the user’s opinion over a product. This can help the company to improve its marketing strategy and quality of product in their favor. Sentiment analysis uses various semantic approaches like on these online reviews to extract as much feature it can and categorize the type of opinion. Some techniques also help in rating the product value based on user’s opinion. This project deals with four different ML techniques; Naïve Bayes, Decision Tree, Random Forest and AdaBoost, training the models for classification of sentiment. The main focus of this project work is to handle the imbalanced datasets. In imbalanced dataset, the classes are of skewed size; having classes in majority and minority sizes. These imbalanced dataset affects the performance of the model and the model would behave biased. To improve the performance of the classification model various Sampling Techniques (Random Under Sampling, SMOTE, SMOTEENN, SMOTEToken) can be used. This project handles the same and performance result of model before and after applying sampling technique is depicted. The metric used for comparison between sampling techniques and classification techniques are Precision, Recall and Accuracy. SMOTEENN can been seen outperforming other three techniques. The comparison results have been shown through tables and graphs in section 4.2.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/18143
Appears in Collections:M.E./M.Tech. Computer Engineering

Files in This Item:
File Description SizeFormat 
M.Tech VASUNDHARA RAJ.pdf1.32 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.