A Review of Decision Tree Algorithms for Predictive Analysis in Data Mining

Diana Moses*, B. Deepa**, Trilochan Patri***, M. Sowmya****
* Professor, Department of Computer Science and Engineering, St. Peter's Engineering College, Hyderabad, India.
**-**** UG Scholar, Department of Computer Science and Engineering, St. Peter's Engineering College, Hyderabad, India.
Periodicity:July - September'2017
DOI : https://doi.org/10.26634/jse.12.1.13923

Abstract

There is a wealth of data archived by business organizations. Analysis of this data provides predictive information for taking proactive decisions and making statistical algorithms which are used for improving the knowledge regarding the engineering process and analysis of data. Data mining is a class of algorithms that analyses the relationship between data and identifies futuristic trends from archived data. Decision tree learning will help us to create a predictive model which will map different items consisting in the set of data and its targets in such a way that each element in this dataset is true. There are many strategies to construct the decision trees, but ID3 is one of the simplest and popularly used decision tree algorithms as there is a disadvantage in ID3 algorithm that it gives more importance to the attributes having multiple values while selecting any item affecting the decision tree. Hence in this paper, the objective is to justify that C4.5 algorithm works better than the ID3 algorithm. C4.5 system of Quinlan is one best classification algorithm that deserves a special mention for several reasons. First best reason is that it is used to represent result of research in machine learning that traces back to the ID3 system. For that reason it is taken as the point of reference for the development and analysis of novel proposals. On the other hand the results of the datasets in this paper proves that C4.5 tree-induction algorithm provides good classification, accuracy, and it is the fastest among the compared main memory algorithms for machine learning and data mining.

Keywords

Decision Tree Algorithms, ID3 Algorithm, C4.5 Algorithm, Data Mining.

How to Cite this Article?

Moses, D., Deepa, B., Patri , T., and Sowmya, M. (2017). A Review of Decision Tree Algorithms for Predictive Analysis in Data Mining. i-manager’s Journal on Software Engineering, 12(1), 38-45. https://doi.org/10.26634/jse.12.1.13923

References

[1]. Agrawal, R., Imielinski, T., & Swami, A. (1993). Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6), 914-925.
[2]. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. (1984). Classification and Regression Trees. CRC Press.
[3]. Chen, H. (1995). Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. Journal of the Association for Information Science and Technology, 46(3), 194- 216.
[4]. Cheng, J., Fayyad, U. M., Irani, K. B., & Qian, Z. (1988). Improved decision trees: A generalized version of ID3. In Proc. Fifth Int. Conf. Machine Learning (pp. 100-107).
[5]. Deisy, C., Subbulakshmi, B., Baskar, S., & Ramaraj, N. (2007, December). Efficient dimensionality reduction approaches for feature selection. In Conference on Computational Intelligence and Multimedia Applications, 2007. International Conference on (Vol. 2, pp. 121-127). IEEE.
[6]. Gehrke, J., Ganti, V., Ramakrishnan, R., & Loh, W. Y. (1999, June). BOAT-optimistic decision tree construction. In ACM SIGMOD Record (Vol. 28, No. 2, pp. 169-180). ACM.
[7]. Gehrke, J., Ramakrishnan, R., & Ganti, V. (1998, August). Rain Forest - A framework for fast decision tree construction of large datasets. In VLDB (Vol. 98, pp. 416- 427).
[8]. Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55-86.
[9]. Herling, T. J. (1995). Adoption of Computer Communication Technology by Communication Faculty: A Case Study. Information Development, 32(4), 986-1000.
[10]. Jin, R. & Agrawal, G. (2003, May). Communication and memory efficient parallel decision tree construction. In Proceedings of the 2003 SIAM International Conference on Data Mining (pp. 119-129). Society for Industrial and Applied Mathematics.
[11]. Joshi, M. V., Karypis, G., & Kumar, V. (1998, March). ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In Parallel Processing Symposium, 1998. IPPS/SPDP 1998. Proceedings of the First Merged International and Symposium on Parallel and Distributed Processing 1998 (pp. 573-579). IEEE.
[12]. Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119-127.
[13]. Loh, W. Y. & Vanichsetakul, N. (1988). Tree - structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83(403), 715-725.
[14]. Mehta, M., Agrawal, R., & Rissanen, J. (1996). SLIQ: A fast scalable classifier for data mining. Advances in Database Technology- EDBT'96 (pp. 18-32).
[15]. Mehta, M., Rissanen, J., & Agrawal, R. (1995, August). MDL- Based Decision Tree Pruning. In KDD (Vol. 21, No. 2, pp. 216-221).
[16]. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
[17]. Quinlan, J. R. (2014). C4.5: Programs for Machine Learning. Elsevier.
[18]. Quinlan, R. (2004). Data mining tools See5 and C5. 0. Rulequest Research.
[19]. Ruggieri, S. (2002). Efficient C4.5 [classification algorithm]. IEEE Transactions on Knowledge and Data Engineering, 14(2), 438-444..
[20]. Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3), 660-674.
[21]. Shafer, J., Agrawal, R., & Mehta, M. (1996, September). SPRINT: A scalable parallel classier for data mining. In Proc. 1996 Int. Conf. Very Large Data Bases (pp. 544-555).
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.