A Methodology for WebLog Data analysis using HadoopMapReduce and PIG

Durga Prasad P S*, T. Vivekanandan**, A.Srinivasan***
* PG Scholar, Department of Computer Science and Engineering, SITAMS, Chittor, Andhra Pradesh, India.
**-*** Associate Professor, Department of Computer Science and Engineering, SITAMS, Chittor, Andhra Pradesh, India.
Periodicity:November - January'2016

Abstract

In the recent time, world is severely facing the problem related to the data storage and processing. Especially, the size of weblog data is exponentially increasing in terms of petabytes and zettabytes. The dependency of weblog data shows its importance on the users' actions on web. To solve and improve the business in all aspects, web data is prominent and hence it is vital. The traditional data management system is not adequate to handle the data in very large size. The Map Reduce programming approach is introduced to deal with the large data processing. In this paper, the authors have proposed a large scale data processing system for analysing web log data through MapReduce programming in Hadoop framework using Pig script. The experimental results show the processing time for classification of different status code in the web log data is efficient, than the traditional techniques.

Keywords

Hadoop, Embedded Pig, MapReduce, Web Log Data

How to Cite this Article?

Prasad, P. S. D., Vivekanandan, T., and Srinivasan, A. (2016). A Methodology for WebLog Data analysis using HadoopMapReduce and PIG. i-manager’s Journal on Cloud Computing, 3(1), 13-17.

References

[1]. Siddharth Adhikari, Devesh Saraf, Mahesh Revanwar, and Nikhil Ankam, (2014). “Analysis of Log Data and Statistics Report Generation using Hadoop”. In IJIRCCE, Vol. 2, No. 4.
[2]. Thanakorn Pamutha, Siriporn Chimphlee and Chom Kimpan, (2012). “Data Pre-processing on Web Server Log Files for Mining Users Access Patterns”. International Journal of Research and Reviews in Wireless Communications, Vol. 2, No. 2, ISSN: 2046-6447.
[3]. Natheer Khasawneh and Chien-Chung Chan, (2006). “Active User-Based and Ontology-Based Web Log Data Pre-processing for Web Usage Mining”. Proceedings of the IEEE International Conference on Web Intelligence.
[4]. Murat Ali Bayir, and Ismail Hakki Toroslu, (2009). “Smart Miner: A New Framework for Mining Large Scale Web Usage Data”. WWW 2009, ACM, Madrid, Spain, 978- 1-60558-487-4/09/04, April 20–24, 2009.
[5]. P. Srinivasa Rao, K. Thammi Reddy and MHM. Krishna Prasad, (2013). “A Novel and Efficient Method for Protecting Internet Usage from Unauthorized Access using MapReduce”. International Journal of Information Technology and Computer Science, Vol. 3, pp. 49-55.
[6]. Sayalee Narkhede and Tripti Baraskar, (2013). “HMR Log Analyzer: Analyze Web Application Logs over Hadoop MapReduce”. International Journal of UbiComp (IJU), Vol. 4, No. 3.
[7]. Ramesh Rajamanickam and C. Kavitha, (2013). “Fast Real Time Analysis of Web Server Massive Log Files using an Improved Web Mining Architecture”. Journal of Computer Science, Vol. 9, No. 6, pp. 771-779, ISSN: 1549- 3636.
[8]. NASA-HTTP, Web Logs Files. Retrieved from Http://ita. ee.lbl.gov/html/contrib/Saskatchawan-HTTP. html
[9]. Tom White, (2015). Hadoop: The Definitive Guide, Fourth Edition, ISBN: 978-1-449-31152-0 1327616795, 2015.
[10]. Naseera Shaik, T. Vivekanandan and K V Madhu Murthy, (2008). “Data Replication using Experience Based Trust in a Data Grid Environment”. Distributed Computing and Internet Technology, Springer, Berlin, Heidelberg, Vol. 5375, pp. 39-50.
[11]. Economist, (2016). Data, Data Everywhere. th Retrieved from http://www.economist.com /node, on 13 July 2016.
[12]. Hadoop, (2016). Welcome to Apache Hadoop. Retrieved from https://hadoop.apache.org.
[13]. Doug Cutting, “Hadoop Overview”. Retrieved from http://research. yahoo.com/node/2116
[14]. “PIG”, https://pig.apache.org.
[15]. Alan Gates, (2011). Programming PIG, O'reilly- First Edition.
[16]. Naseera Shaik, T. Vivekanandan and K V Madhu Murthy, (2008). “Trust Based Data Replication Strategy in a Data Grid Environment”. In Proceedings of International Conference on Information processing (ICIP), Banglore.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.