TAXONOMY OF TREE BASED CLASSIFICATION ALGORITHM IN DATA MINING AND THEIR APPLICATIONS IN PREDICTING STUDENTS

KOHLI, DILPREET SINGH

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More

Please use this identifier to cite or link to this item: http://dspace.dtu.ac.in:8080/jspui/handle/repository/14004

Title:	TAXONOMY OF TREE BASED CLASSIFICATION ALGORITHM IN DATA MINING AND THEIR APPLICATIONS IN PREDICTING STUDENTS
Authors:	KOHLI, DILPREET SINGH
Keywords:	DECISION TREE ALGORITHMS DECISION TREE ALGORITHMS
Issue Date:	28-Jun-2012
Series/Report no.:	TD 851;192
Abstract:	C4.5 is a very renowned tree based classification algorithm, developed by Ross Quinlan. It is an extension of ID3 Algorithm and is used to generate a decision tree which is used for classification (a pre-processing step of data mining).It is a statistical classifier based on the concept of information entropy. There are two critical factors to this algorithm i.e. prediction accuracy and time complexity. These are directly associated to heuristic function to measure the importance of attributes which in turn is used to generate the decision tree. In this research I propose improvements over an existing C4.5 Algorithm by introducing two new heuristic functions which are better than the one used by C4.5 Algorithm by some way or the other. The main focus is on 2 performance measures 1) Time to build the tree, 2) Prediction accuracy. To prove the existence of these improvements I apply these algorithms on some case studies (examples), two of which are proposed by me in my minor project as part of my research. One of the biggest challenges that higher education faces today is predicting the paths of students. Colleges would like to know, for example, which students will take admission in particular course, and which students will need assistance in order to graduate. So based on the research I developed two case studies. – A scheme of student evaluation that can help the universities (at the time of counseling) to judge whether the student matches the offered program. We take the student attributes and combine them with branch attributes and based on the historical data, satisfaction level of student for that branch is calculated. – Here we are computing the grade of a student in a class for a particular subject. The system actually combines student attributes and subject attributes and based on the historical data, grade of the student for a particular subject is calculated. In another case study (example) I am testing the proposed algorithms on a real AIEEE data which is in the range of thousands. The idea behind taking more than one case study (example) is to prove that the algorithms not only work well for one type of data sets but also for varied data sets. The improvements proposed over C4.5 can have a significant impact on the practical applications. We want the practical applications to be solved more efficiently and effectively. In future, a generic tool for tree based classification algorithms can be developed where user can select an appropriate algorithm for its application, depending upon its need in terms of prediction accuracy or time complexity. For example, if user is working on an application where results need to be generated faster, then it can select Algorithm1 or if the user is working on some critical application where results of classification need to be more accurate, then it can select algorithm 2.
URI:	http://dspace.dtu.ac.in:8080/jspui/handle/repository/14004
Appears in Collections:	M.E./M.Tech. Computer Technology & Applications

Files in This Item:

File	Description	Size	Format
Dilpreet_cta.PDF		3.64 MB	Adobe PDF	View/Open

Show full item record