Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19004
Title: APPLYING STATISTICAL TECHNIQUES ON TRAFFIC FEATURES FOR INTRUSION DETECTION
Authors: SHARMA, SOMYA
SHARMA, YASH
Keywords: STATISTICAL TECHNIQUES
TRAFFIC FEATURES
INTRUSION DETECTION
NETWORK SECURITY
Issue Date: May-2021
Series/Report no.: TD-5582;
Abstract: In this age when network attacks and malicious activities seem to be at their peak, cybersecurity plays a key role in detecting network intrusion and prevention of illegitimate access to one’s data. In this thesis, we emphasize on the need of Network Security. Needless to say, network attacks can cause immense financial and practical loss to the companies, associations or even individuals too. It is often noticed that antiviruses and firewalls, which are used to provide enough network security, are not at any level adequate to guarantee the insurance of an organization now against these everchanging attacks. These conventional tools have been found to be unsuccessful in defending network systems satisfactorily from increasing refined attacks and malwares. This kind of situation requires smart counter measures to keep up the security of networks and important systems. Hence, in this work, we aim to build effective system to detect intrusions based on network traffic. Chapter 1 of this thesis gives an insight about the meaning and role of intrusion detection system. Inspecting network traffic and computer cases to recognize malicious or unauthorized activities is a process called " intrusion detection”. Intrusion Detection System (IDS) can be defined as any device or any software design application whose motive is to direct an intrusion detection. IDSs can screen exercises inside the secured network and not exactly at its perimeter. In contrast to a firewall, IDSs just have an inspection job. Further, we classify all the different types of IDSs namely signature based, anomaly based and hybrid detection. The main features that an effective Intrusion Detection System ought to have are effectiveness, ability to adapt and extensibility. We conclude chapter 1 with some limitations and problems raised by the existing systems. Chapter 2 revolves around the works proposed in the literature for network traffic- based intrusion detection. We review more than 50 papers in this chapter. In Chapter 3, we introduce all the 12 features with their names and their meanings that are being used in our research. Then we use statistical tests to rank these features in order of their efficiency to detect network intrusion. The main objective of our work is to obtain a certain set of features that will show higher accuracy than all the other features individually or combined. For this research, we use two different statistical tests namely ANOVA test and CHI-SQUARE test. ANOVA test stands for analysis of variances. It is a statistical method which verifies the impact of a number of factors by comparing the average or means of various sample data whereas the Chi Square statistic is used to determine if the variables of different categories defined are independent of each other or not. Then we move on to the machine learning classifiers and their purpose. For our research we use three different types of machine learning classifiers namely, Decision tree, SVM and Random Forest. The features are ranked in a way such that the feature fixed at the bottom will be the least efficient and vice versa. In this way, we prepare three columns: One for ANOVA, second for Chi-Square normal and another for Chi-Square Malware. Further, we prepare another table taking all the single features one at a time and separately apply all three of the machine learning classifiers. Now to take into consideration the possibility that the combination of these features might be even more effective, we make different combinations of features and after applying all three machine learning classifiers again we obtain 3 different sets of columns. Further, the above method is repeated on these 3 columns and we obtain a separate set of features in the end which are best in terms of network intrusion. In Chapter 4, we present all the tables, calculations and proofs to reach at a conclusion that according to our research a particular combination of 5 features namely “Bytes received”, “Bytes sent”, “Time interval between packets sent”, “Packet size sent” and “Packet size received” gives the highest accuracy of 99.49% after applying Decision Tree amongst all possible features and combinations. In Chapter 5, we conclude the thesis with future work directions.
URI: http://dspace.dtu.ac.in:8080/jspui/handle/repository/19004
Appears in Collections:M Sc Applied Maths

Files in This Item:
File Description SizeFormat 
Somya Sharma M.Sc..pdf613.54 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.