Machine Learning Based Architecture for Rule Establishment of Web Proxy Server

G. Sahoo*, Partha Sarathy Banarjee**
* Professor & HOD, Department of Information Technology & MCA, B.I.T. Mesra, Ranchi, Jharkhand, India.
** Assistant Professor, Department of Computer Science & Engineering, Jaypee University of Engineering & Technology, Guna, (MP), India.
Periodicity:March - May'2013


In present scenario Internet has become an integral part of every one's life, as many services like mail, news, chat are available and huge amounts of information on almost any subject is available. However, in most cases the bandwidth to connect to the Internet is limited. It needs to be used efficiently and more importantly productively. Generally, bandwidth is distributed among groups of users based on some policy constraints. However, it turns out that the users do not always use the entire allocated bandwidth at all times. Also, some times they need more bandwidth than the bandwidth allocated to them. But when it is abundant then any kind of use can be permitted provided it is in consonance with the policy. The bandwidth usage patterns of users vary with time of the day, time of the year hence to stabilize this,a dynamic allocation of bandwidth that satisfies the requirements of the users is needed.

To maximize productive usage, a need to implement control access policies has to be implemented which prevents unproductive use but at the same time does not, to the extent possible, impose censorship. Squid proxy server is a an example of the same. Squid provides many mechanisms to set access control policies. However, deciding which policies to im¬plement requires experimentation and usage statistics that must be processed to obtain useful data. The proposed architecture elaborated in this paper is based on machine learning to de¬termine policies depending on the content of current URLs being visited. The main component in this architecture is the Squid Traffic Analyzer, which classifies the traffic and generates URL lists. The concept of delay priority will also be introduced which gives more options to system administrators in setting policies for bandwidth management.


Web Proxy, Machine-learning, Network traffic, Meta data

How to Cite this Article?

Sahoo, G., and Banerjee, P.S. (2013). Machine Learning Based Architecture For Rule Establishment of Web Proxy Server. i-manager’s Journal on Computer Science, 1(1), 8-15.


[1]. Tom M Mitchell. (1997). “Machine Learning”, McGraw-Hill International Edition.
[2]. S. Tanenbaum., (1999). “Computer Networks”. Printice-Hall of India, Pvt. Ltd., 3rd Edition, (pp. 381-384).
[3]. Tunneling TCP based protocols through web proxy servers. www.webcache. Com/Writings/lnternet- Drafts/ draft-luotonen-web-proxy-tunneling-01.txt
[4]. Squid proxy server. URL
[5]. Squid Frequently Asked Questions
[6]. Squid configuration file: squid.conf.
[7]. Squid configuration Manual. www.visolve. comjsquid24s1/contents.html
[9]. Upgrading to TLS Within HTTP/1.1 rfc2817.txt.
[10]. HTTP-Tunnel Corporation - Networking Products for Corporate Communications.
[11]. Cutting Edge Web Applications.
[12]. Squid Cache Logfile Analysis Scripts.
[13]. ACalamaris: Log analyzer calamaris/.
[14]. Webalizer: log analyzer http://mrunix.netjwebalizr.
[15]. Cache Digest Specification- Version 5 . www.sequid-cache. org/ CacheDigest/ cache-digest-v5. txt.
[16]. Hyper Text Transfer Protocol - HTTP/1.1 www. ietf. org/rfc/rfc2616.txt.
[17]. A. Rousskov and V. Soloviev. (1999). “A Performance Study of the Squid Proxy on HTTP/l.0”. In World Wide Web, June (pp. 47-67).
[18]. Chamara Gunaratne, Gihan Dias, (2002). (University of Moratuwa) Using Dynamic Delay Pools for Bandwidth Management URL:
[19]. Squid programmers guide
[20]. Lang, K. Newsweeder, (1995). Learning to filter netnews. In Priedits and Russel (Eds.), Proceedings of 12th International Conference on Machine Learning (pp. 331-339). San Francisco: Morgann Kaufmann Publishers.
[21]. Rish, (2001). “An empirical study of the naive Bayes classifier”. in IJCAI-01 workshop on "Empirical Methods in AI".
[22]. Rish, I., Hellerstein, J., and Jayram, T.S. (2001). An analysis of data characteristics that affect naive Bayes performance. IBM Technical Report RC21993.
[23]. I. Rish. (2000). “Advances in Bayesian Learning”, a short tutorial presented at ICAI'2000, Las Vegas, (June).

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

If you have access to this article please login to view the article or kindly login to purchase the article
Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.