OFFENSIVE LANGUAGE DETECTION FROM SOCIAL MEDIA TEXT USING MACHINE LEARNING CLASSIFICATION METHODS

PATRA, TANAYA

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/20659

Full metadata record

DC Field	Value	Language
dc.contributor.author	PATRA, TANAYA	-
dc.date.accessioned	2024-08-05T08:21:03Z	-
dc.date.available	2024-08-05T08:21:03Z	-
dc.date.issued	2024-05	-
dc.identifier.uri	http://dspace.dtu.ac.in:8080/jspui/handle/repository/20659	-
dc.description.abstract	The utilization of social networking platforms has significantly surged in recent years, leading to a substantial rise in user-generated content across the web. This information predominantly appears in unorganized and somewhat organized forms. Numerous social media platforms face the issue of hate speech, which takes on different forms including aggressive language and the development of visual content like memes. This research focuses on employing Twitter data to identify offensive speech online. Nowadays, techniques for ML which is machine learning and NLP which is natural language processing) have been increasingly utilized for detecting hateful content on the internet. This study specifically addresses the issue of offensive speech detection in textual data by applying _machine learning techniques. Prior to utilizing the dataset with machine learning models, feature selection was conducted. Various machine learning algorithms were applied to an openly accessible Twitter dataset. Offensive speech can be defined as, use of such text or words which are aggressive, violent, or abusive in nature and directed towards a certain group or individual who shares a gender, ethnicity, set of beliefs, or place of residence. The suggested model can automatically identify hateful content on Twitter. This method relies on the TF IDF where TF is known as term frequency and IDF is known as inverse document frequency methodology and a bag of words. Machine learning classifiers are trained using these features. Thorough tests are carried out on the available Twitter dataset, and by comparing 5 different models based on their performance we can conclude that Random Forest Classifier algorithms works best with highest accuracy of 95.22%.	en_US
dc.language.iso	en	en_US
dc.relation.ispartofseries	TD-7084;	-
dc.subject	OFFENSIVE LANGUAGE DETECTION	en_US
dc.subject	SOCIAL MEDIA TEXT	en_US
dc.subject	MACHINE LEARNING	en_US
dc.subject	CLASSIFICATION METHODS	en_US
dc.title	OFFENSIVE LANGUAGE DETECTION FROM SOCIAL MEDIA TEXT USING MACHINE LEARNING CLASSIFICATION METHODS	en_US
dc.type	Thesis	en_US
Appears in Collections:	M.E./M.Tech. Computer Engineering

Files in This Item:

File	Description	Size	Format
TANAYA PATRA M.Tech..pdf		1.64 MB	Adobe PDF	View/Open

Show simple item record