i-manager Publications

Big Data Mining Platforms: Distributed Aggregation for Data-Parallel Computing

Bala Krishna Sapparam*, Janardhan**

*-** Associate Professor, Department of Computer Science Engineering, Yogananda Institute of Technology and Science, Tirupati, India.

Periodicity:June - August'2014
DOI : https://doi.org/10.26634/jit.3.3.2944

Abstract

Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. In typical data mining systems, the mining procedures require computational intensive computing units for data analysis and comparisons. A computing platform is, therefore, needed to have efficient access to, at least, two types of resources: data and computing processors. For small scale data mining tasks, a single desktop computer, which contains hard disk and CPU processors, is sufficient to fulfill the data mining goals. Indeed, many data mining algorithm are designed for this type of problem settings. For medium scale data mining tasks, data are typically large (and possibly distributed) and cannot be fit into the main memory. Common solutions are to rely on parallel computing , collective mining to sample and aggregate data from different sources and then use parallel computing programming .In this paper we concentrated the Tier I i.e.,Big Data Mining Platforms by using MapReduce (MR). For this technique we follow the Distributed Aggregation for Data Parallel computing. In this technique we reduce the network traffic over the network.

Keywords

Big Data, Big Data Platform, Data Mining, Distributed Aggregation, MapReduce(MR)

How to Cite this Article?

Bala Krishna Sapparam and Janardhan (2014). Big Data Mining Platforms: Distributed Aggregation For Data-Parallel Computing. i-manager’s Journal on Information Technology, 3(3), 20-31. https://doi.org/10.26634/jit.3.3.2944

References

[1]. Ahmed and Karypis Rezwan Ahmed, George Karypis, (2012), Algorithms for mining the evolution of conserved relational states in dynamic networks, Knowledge and Information Systems, December 2012, Volume 33, Issue 3, pp 603-630

[2]. Alam et al. Md. Hijbul Alam, JongWoo Ha, SangKeun Lee, Novel (2012), approaches to crawling important pages early, Knowledge and Information Systems December 2012, Volume 33, Issue 3, pp 707-734

[3]. Aral S. and Walker D. (2012), Identifying influential and susceptible members of social networks, Science, Vol.337, pp.337-341.

[ 4 ] . M a c h a n a v a j j h a l a a n d R e i t e r A s h w i n Machanavajjhala, Jerome P. Reiter: (2012), Big privacy: protecting confidentiality in big data. ACM Crossroads, 19(1): 20-23, 2012.

[5]. Banerjee and Agarwal (2012), Soumya Banerjee, Nitin Agarwal, Analyzing collective behavior from blogs using swarm intelligence, Knowledge and Information Systems, December 2012, Vol 33, Issue 3, pp 523-547

[6]. Birney E. (2012), The making of ENCODE: Lessons for big-data projects, Nature, vol.489, pp.49-51.

[7]. Bollen et al. (2011), J. Bollen, H. Mao, and X. Zeng, Twitter Mood Predicts the Stock Market, Journal of Computational Science, 2(1):1-8, 2011.

[8]. Borgatti S., Mehra A., Brass D., and Labianca G. (2009), Network analysis in the social sciences, Science, Vol. 323, pp.892-895.

[9]. Bughin et al. J Bughin, M Chui, J Manyika, (2010), Clouds, big data, and smart assets: Ten tech-enabled business trends to watch, McKinSey Quarterly, 2010.

[10]. Centola D. (2010), The spread of behavior in an online social network experiment, Science, Vol.329, pp.1194-1197.

[11]. Chang et al., Chang E.Y., Bai H., and Zhu K., (2009), Parallel algorithms for mining large-scale rich-media data, In: Proceedings of the 17th ACM International Conference on Multimedia (MM '09), New York, NY, USA, 2009, pp. 917-918.

[12]. Chen et al. R. Chen, K. Sivakumar, and H. Kargupta, (2004), Collective Mining of Bayesian Networks from Distributed Heterogeneous Data, Knowledge and Information Systems, Vol 6(2): pp164-187, 2004.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

Big Data Mining Platforms: Distributed Aggregation for Data-Parallel Computing

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: