Web Information Extraction Using Deep Learning Algorithm

J. Sharmila*, Dr. A.Subramani**
* Research Scholar, Manonmanium Sundaranar University, Tirunelveli, India.
** Research Supervisor, Professor & Head Department of Computer Applications, K.S.R. College of Engineering, Thiruchengode, Tamilnadu, India.
Periodicity:October - December'2014
DOI : https://doi.org/10.26634/jse.9.2.3325

Abstract

Web mining related research is getting more important nowadays because of the large amount of data that is managed through the internet. Web usage is increasing in an uncontrollable manner. A specific system is needed for controlling such large amount of data in the web space. Web mining is classified into three major divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong and Wai Lam have proposed a web content mining approach in a research with the help of Bayesian networks. In their approach, they discuss on extracting web information and attribute discovery based on the Bayesian approach. Inspired from their research, the authors intend to propose a web content mining approach, based on a deep learning algorithm. The deep learning algorithm provides the advantage over Bayesian networks because Bayesian network is not considered in any learning architecture alike the proposed technique. In the proposed approach, three features are considered for extracting the web content. The features used are: concept feature that deals with the semantic relations on the web, format feature that deals with the format of the content and title feature, which deals with the web title. The above listed features produce some model parameters, which are given as the input to the deep learning algorithm. The process continues according to the deep learning algorithm and finally extracts content according to the input provided. There are a lot of approaches that have been developed in the area of Web Information Extraction (IE), which are concerned with harvesting useful information for any further analysis from web pages. Learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success. In this paper, a method has been proposed for information extraction from the Web using Deep Learning Algorithm.

Keywords

Web Content Mining, Restricted Boltzman Machine,> Deep Learning Algorithm,Web Extraction.

How to Cite this Article?

Sharmila.J.,, and Subramani.A.(2014). Web Information Extraction Using Deep Learning Algorithm. i-manager’s Journal on Software Engineering,9(2), 24-35. https://doi.org/10.26634/jse.9.2.3325

References

[1]. Jen-Yuan Yeh, Hao-Ren K and Wei-Pang Yang, (2008). "iSpreadRank: Ranking sentences for extraction- based summarization using feature weight propagation in the sentence similarity network", Expert Systems with Applications, Vol.35, No. 3, pp. I 45 I - I 462.
[2]. Ladda 5uanmali, Naomie 5alim and Mohammed 5alem Binwahlan, (2009). "AutomatlcText Summarlzatlon Using Feature Based Fuzzy Extractlon", Vol.20, No.2, pp, I 05- I I 5.
<[3]. Mohammed 5alem Binwahlan, Naomie5alim and Ladda5uanmali, (2009). "Swarm Based Feature Selection for Text Summarization", IJCSNS Internatlonal Journal of Computer ScienceandNetworkSecurify Vol. 9, No.1 .
[4]. Zhang Pei-Ying, Li Cun-he, (2009). "Automatic text summarization based on sentences clustering and extraction," 2nd /EEE \nternational Conference on Computer Sclenceand lnformatlonTechnology pp.167- 170.
[5]. Chen Hong-ping; Fang Wei; Yang Zhou; Zhuo Lin; Cui Zhi-Ming; (2009). "Automatic Data Records Extraction from List Page in Deep Web Sources," AsIa-Paciflc Conference on Informatlon Processing, Vol. I , pp.370- 373 .
[6]. Tak-Lorn Wong and Wai Lam, (2010}. "Learning to Adapt Web Information Extraction Knowledge and Discovering New Attributes via a Bayesian Approach" , /EEE Transactlons on Know/edge and Data Englneering, Vol.22, No.4, pp: 523-536 .
[7]. Qingshui Li; Kai Wu; (2010}. "Study of Web Page information topic extraction technology based on vision," IEEE International Conference on Computer Science and InformationTechnology(ICCSIT), Voi. 9, pp. 78 I -784.
[8]. Andrew Clearwater, (2010). "The new ontologies.' the effect of copyright protection on public scientific data sharing using semantic web ontologies,"VoLIO, pp.182- 205.
[9]. Van Liu, Sheng-huaZhong, Wen-jie Li, (2012}. "Query- Oriented Unsupervised Multi-document Summarization via Deep Learning", Elsevier, Copyright @ 2012, Association for the Advancement of Artificial InteHigence (www.aaai,org). Ail rights reserved. pp.1699 -1 705.
[ 10]. A. Subramani, (2014). "A Method for Extracting Information from the Web using Deep Learning Aigorithm",Internationai Journal of Theoretical and Applied Information Technology 2Oth October, Voi,68, No.2 (fSSN: I 992-8645 E-ISSN: 181 7-3 I 95). pp:474-484. www,jatif,org.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.