Preparing Data Sets by Using Horizontal Aggregations in SQL for Data Mining Analysis

K.Sentamilselvan*, S.Vinoth Kumar**, A.Jeevanantham***
*_**_*** Assistant Professor, Department of Information Technology, Kongu Engineering College, Erode, Tamil Nadu, India.
Periodicity:September - November'2015
DOI : https://doi.org/10.26634/jit.4.4.3646

Abstract

Data Mining is one of the emerging fields in research. Preparing a Data set is one of the important tasks in Data Mining. To analyze data efficiently, Data Mining systems are widely using datasets with columns in horizontal tabular layout. Building a datasets for analysis is normally a most time consuming task. Existing SQL aggregations have limitation to build data sets because they return one column for aggregated group using group functions. A method is developed to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions are called horizontal aggregations. This method is termed as BY-LOGIC. SQL code generator generates automatic SQL code for producing horizontal aggregation. A fundamental method to evaluate horizontal aggregation called CASE (exploiting the case programming construct) is used. Basically, there are three parameters available namely: grouping, sub-grouping and aggregating fields for creating horizontal aggregation. Query evaluation shows that CASE method responses faster than BY-LOGIC method.

Keywords

Data Mining, Data Set, SQL, Horizontal Aggregation, BY-LOGIC, CASE, GROUP BY, Query Evaluation, Vertical Aggregation

How to Cite this Article?

Sentamilselvan. K, Kumar. S. V and Jeevanantham. A (2015). Preparing Data Sets by Using Horizontal Aggregations in SQL for Data Mining Analysis. i-manager’s Journal on Information Technology, 4(4), 33-41. https://doi.org/10.26634/jit.4.4.3646

References

[1]. C. Cunningham, G. Graefe, and C.A. Galindo- Legaria, (2004). "PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS," Proc. 13th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 998-1009,
[2]. Carlos Ordonez and Zhibo Chen, (2012). 'Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis', IEEE Transactions On Knowledge and Data Engineering, Vol. 24 No. 4, pp. 678-691.
[3]. G. Graefe, U. Fayyad, and S. Chaudhuri, (1998). "On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases," Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD '98), pp. 204-208,.
[4]. Gray J, Bosworth A, Layman A and Pirahesh H, (1996). “Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub-Total”, Proc.Int'1 Conf. Data Eng., pp. 152-159.
[5]. H. Garcia-Molina, J.D. Ullman, and J. Widom, (2001). Database Systems:The Complete Book, first ed. Prentice Hall,
[6]. J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, (1999). "Non-Stop SQL/MX Primitives for Knowledge Discovery," Proc. ACM SIGKDD Fifth Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 425- 429.
[7]. Ordonez C. (2004). “Horizontal Aggregations for Building Tabular Data Sets”, Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD'04), pp. 35-42.
[8]. Ordonez C., (2004). “Vertical and Horizontal Percentage Aggregations”, Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), pp. 866-871.
[9]. Ordonez C. (2011). “Data Set Preprocessing and Transformation in Database System”, Intelligent Data Analysis, Vol. 15 No. 4, pp. 613-631.
[10]. OWASP. (2001). Top ten most critical web application s e c u r i t y v u l n e r a b i l i t i e s . R e t r i e v e d f r o m https://www.owasp.org/index.php/ Top_10_2013- T o p _ 1 0 . F o r g e r i e s . w w w . s e c u r i t y f o c u s . c o m / archive/1/19S90,2001.
[11]. S. Sarawagi, S. Thomas, and R. Agrawal, (1998). "Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 343-354,.
[12]. Sentamilselvan, K., S. Lakshmana Pandian, and N. Ramkumar. (2014). "Cross Site Request Forger y: Preventive Measures, " International Journal of Computer Applications, Vol. 106.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.