Comparative Analysis on a Predictive Model using Tree Based Machine Learning Techniques for Big Data Analytics

Lakshmi J. V. N. *
Assistant Professor, Department of MCA, Acharya Institution of Management and Sciences, Bangalore, Karnataka, India.
Periodicity:March - May'2018
DOI : https://doi.org/10.26634/jit.7.2.14647

Abstract

Internet of Things (IoT), Big Data (BD), Artificial Intelligence (AI) and Machine Learning (ML) are the novel approaches where communication happens between man-made machines. Machines interact and acquire knowledge by implementing learning algorithms. Data analytics, prediction and classification methods are machine learning approaches applied on Big data for processing various unstructured data patterns. MapReduce is a widely used programming framework to parallelize these machine learning algorithms. To accomplish best outcomes, the algorithms are fine tuned using parallel practice. This technique uses MapReduce model for processing datasets multiple times by tuning the parameters as per the requirement. But this existing MapReduce model endures with high disk rates resulting in low throughput and inefficient time complexity. To achieve the minimal time consumption for tuning the jobs, Apache Spark framework replaces the MapReduce model. This is examined in this paper by evaluating the prediction on "Demand and Supply of India" dataset. A comparative analytical study is proposed in this paper to predict the demand for forecasting by training the existing data using tree based machine learning techniques. The prediction outcomes computed are compared on tree structured ML methods with respect to time and space utilization.

Keywords

Apache Spark, Big Data, Data Analysis, Decision Tree, Gradient Boosting Tree, Machine Learning, Prediction Random Forest.

How to Cite this Article?

Lakshmi,J.N.V. (2018). Comparative Analysis on a Predictive Model Using Tree Based Machine Learning Techniques For Big Data Analytics. i-manager’s Journal on Information Technology, 7(2), 8-15. https://doi.org/10.26634/jit.7.2.14647

References

[1]. Apache Hive T. M. (2014). Retrieved from https://hive. apache.org/
[2]. Asha, T., Shravanthi, U. M., Nagashree, N., & Monika, M. (2013). Building machine learning algorithms on Hadoop for bigdata. International Journal of Engineering and Technology, 3(2), 484-489.
[3]. Bowles, M. (2012). Machine Learning on Big Data using MapReduce. Retrieved from http://machine learningbigdata.pbworks.com/w/file/fetch/50030744/M achine%20
[4]. Bowles, M. (2015). Machine Learning in Python: Essential Techniques for Predictive Analysis. John Wiley & Sons.
[5]. Brownlee, J. (2016). Master Machine Learning Algorithms: Discover How they Work and Implement them from Scratch. Jason Brownlee.
[6]. Caruana, R., Karampatziakis, N., & Yessenalina, A. (2008). An empirical evaluation of supervised learning in th high dimensions. In Proceedings of the 25 International Conference on Machine Learning (pp. 96-103). ACM.
[7]. Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G., Olukotun, K., & Ng, A. Y. (2007). MapReduce for machine learning on multicore. In Advances in Neural Information Processing Systems (pp. 281-288).
[8]. Hadoop–MapReduce. (2018). Retrieved from https://www.tutorialspoint.com/hadoop/hadoop_mapre duce.htm
[9]. Lakshmi, J. V. N. (2016). Stochastic Gradient Descent using Linear Regression with Python. International Journal on Advanced Engineering Research and Applications, 2(7), 519-524.
[10]. Lakshmi, J. N. V. (2017). Hadoop Spark Framework for Machine Learning using Python. International Journal of Scientific and Engineering Research, 8(5), 46-48.
[11]. Logistic Regression. (2018). In Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Logistic_regression
[12]. Manar, & Stephane. (2015). Machine Learning with Python/Scikit Learn - Alication to the Estimation of Occupancy and Human Activities. SIMUREX.
[13]. Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., DeWitt, D. J., Madden, S., & Stonebraker, M. (2009). A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (pp. 165-178). ACM.
[14]. Romsaiyud, W., & Premchaiswadi, W. (2013). An adaptive machine learning on Map-Reduce framework for improving performance of large-scale data analysis on EC2. In ICT and Knowledge Engineering (ICT&KE), 2013 th 11 International Conference on (pp. 1-7). IEEE.
[15]. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464.
[16]. Stuart R., & Harald B. (2007). Beginning Python for Language Research, pp. 44-47.
[17]. Tamano, Hiroshi, Shinji Nakadai, & Takuya Araki. (2011). Optimizing multiple machine learning jobs on MapReduce. In Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, pp. 59-66. IEEE.
[18]. Trabelsi, S., Elouedi, Z., & Mellouli, K. (2006). Pruning method of belief decision trees. World Acad. Sci. Eng. Technol., 21, 100-105.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.