jpr.4.3.13888 Understanding Hindsight, Insight and Forsight Data To Large-Scale Distributed Data Intelligence (Algorithms) Machine: A Scale-Out Review Sabibullah Mohamed Hanifa Journal on Pattern Recognition 2350-112X 4 3 32 43 10.26634/jpr.4.3.13888 Big Data, Analytics, Apache Spark, Flink, Hadoop, Recommender Engine, Machine Learning, Classification, Clustering, Algorithms, Collaborative Filtering, MLlib, Data Science, Scaling, Cloud Storage This review attempts to comprehend the insights and foresights into the grasping power of data science, data quality, data process, data pre-process, big data, big data process and analysis, analytics, BD (Big Data) and analytics lifecycle, file storage, platforms/technologies supported, Hadoop concepts, eco-system components, and design principles. Principle and philosophy behind computations are also explained through flow diagram. Various analytics based on its solutions, cluster computing based platforms like Apache Spark (its architecture – core, other components, and utilities), MLlib package – Machine Learning (ML) methods/ tasks and detailed supported algorithms are exclusively elucidated, to understand the concepts of these pinpoints. The explored comprehensive contents would definitely be useful and provide core understanding knowledge in the large scale ML dependent algorithms process, suitable to build the relevant application solutions (may be predictions/ classifications/ segmentations/ recommendations) via Apache – Spark environment. September - November 2017 Copyright © 2017 i-manager publications. All rights reserved. i-manager Publications http://www.imanagerpublications.com/Article.aspx?ArticleId=13888