jpr.4.3.13888
Understanding Hindsight, Insight and Forsight Data To Large-Scale Distributed Data Intelligence (Algorithms) Machine: A Scale-Out Review
Sabibullah Mohamed Hanifa
Journal on Pattern Recognition
2350-112X
4
3
32
43
10.26634/jpr.4.3.13888
Big Data, Analytics, Apache Spark, Flink, Hadoop, Recommender Engine, Machine Learning, Classification, Clustering, Algorithms, Collaborative Filtering, MLlib, Data Science, Scaling, Cloud Storage
This review attempts to comprehend the insights and foresights into the grasping power of data science, data quality, data process, data pre-process, big data, big data process and analysis, analytics, BD (Big Data) and analytics lifecycle, file storage, platforms/technologies supported, Hadoop concepts, eco-system components, and design principles. Principle and philosophy behind computations are also explained through flow diagram. Various analytics based on its solutions, cluster computing based platforms like Apache Spark (its architecture – core, other components, and utilities), MLlib package – Machine Learning (ML) methods/ tasks and detailed supported algorithms are exclusively elucidated, to understand the concepts of these pinpoints. The explored comprehensive contents would definitely be useful and provide core understanding knowledge in the large scale ML dependent algorithms process, suitable to build the relevant application solutions (may be predictions/ classifications/ segmentations/ recommendations) via Apache – Spark environment.
September - November 2017
Copyright © 2017 i-manager publications. All rights reserved.
i-manager Publications
http://www.imanagerpublications.com/Article.aspx?ArticleId=13888