Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/16701
Title: | SOFT COMPUTING FOR RUMOUR ANALYTICS ON BENCHMARK TWITTER DATA |
Authors: | SHARMA, HARSHITA |
Keywords: | SOFT COMPUTING RUMOUR ANALYTICS TWITTER DATA PHEME |
Issue Date: | Jun-2019 |
Series/Report no.: | TD-4543; |
Abstract: | As social media is a fertile ground for origin and spread of rumours, it is imperative to detect and deter rumours. Various computational models that encompass elements of learning have been studied on benchmark datasets for rumour resolution with four individual tasks, namely rumour detection, tracking, stance and veracity classification. Quick rumour detection during initial propagation phase is desirable for subsequent veracity and stance assessment. This research presents the use of adaptive and heuristic optimization to select a near-optimal set of input variables that would minimize variance and maximize generalizability of the learning model, which is highly desirable to achieve high rumour prediction accuracy. An empirical evaluation of hybrid filter-wrapper on PHEME rumour dataset is done. The features are extracted initially using the conventional term frequency-inverse document frequency (TF-IDF) statistical measure and to select an optimal feature subset two filter methods, namely, information gain and chi-square are separately combined with three swarm intelligence-based wrapper methods, cuckoo search, bat algorithm and ant colony optimization algorithm. The performance results for the combinations have been evaluated by training three classifiers (Naïve Bayes, Random Forest and J48 decision tree) and an average accuracy gain of approximately 7% is observed using hybrid filter-wrapper feature selection approaches. Chi-square filter with Cuckoo and ACO give the same maximum accuracy of 61.19% whereas Chi-square with bat gives the maximum feature reduction selecting only 17.6% iv features. The model clearly maximizes the relevance and minimizes the redundancy in feature set to build an efficient rumour detection model for social data. Due to the ever increasing use and dependence of netizens on social media, it has become a fertile ground for breeding Rumours. This work aims to propose a model for Potential Rumour Origin Detection (PROD) to enable detection of users who can be likely rumour originators. It can not only help to find the original culprit who started a rumour but can aid in veracity classification task of the rumour pipeline as well. This work uses features of the user’s account and tweet to extract meta-data. This meta-data is encoded in an 8 tuple feature vector. A credibility quotient for each user is calculated by assigning weights to each parameter. The higher the credibility of a user, less likely it is to be a rumour originator. Based on the credibility, a label is assigned to each user indicating whether it can be a potential rumour source or not. Three supervised machine learning algorithms have been used for training and evaluation and compared to a baseline zeroR classifier. The results have been evaluated on benchmark PHEME dataset and it is observed that the multi-layer perceptron classifier achieves the highest performance accuracy, that is, an average 97.26% for all five events of PHEME to detect potential rumour source. |
URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/16701 |
Appears in Collections: | M.E./M.Tech. Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Harshita Sharma SWE Final Thesis .pdf | 1.37 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.