A Scalable and Cost-Effective Data Anonymization over Big Data using MapReduce on Cloud

Shalin Elizabeth. S*, S.Sarju**
* M.Tech Student, St Josephs College of Engineering and Technology, Kerala, India.
** Assistant Professor, Department of Computer Science and Engineering, St.Josephs College of Engineering and Technology, Kerala, India.
Periodicity:February - April'2015
DOI : https://doi.org/10.26634/jcc.2.2.3449

Abstract

In big data applications, data privacy is one of the most important issues on processing large-scale privacy-sensitive data sets, which requires computation resources provisioned by public cloud services.It refers to the commercial "aggregation, mining, and analysis" of very large, complex and unstructured datasets. Due to its large size, discovering knowledge or obtaining pattern from big data within an elapsed time is a complicated task. The cloud and the advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals. The internet users also share their private data like health records and financial transaction records for mining or data analysis purpose. For which, data anonymization is used for hiding identity or sensitive intelligence. This paper investigates the problem of big data anonymization for privacy preservation from the perspectives of scalability and cost-effectiveness. Anonymizing large scale data within a short span of time is a challenging task. To overcome that, Enhanced Top –Down Specialization approach (ETDS) can be developed which is an enhancement of Two –Phase Top Down Specialization approach (TPTDS). Accordingly, a scalable and cost-effective privacy preserving framework is developed to provide a holistic conceptual foundation for privacy preservation over big data which enable users to accomplish the full potential of the high scalability, elasticity, and cost-effectiveness of the cloud. The multidimensional anonymization of MapReducing framework will increase the efficiency of the big data processing system.

Keywords

Data Anonymization, Privacy Preservation, Top Down Specialization, MapReduce, Big Data, Cloud Computing.

How to Cite this Article?

Elizabeth. S. S., and Sarju, S. (2015). A Scalable and Cost-Effective Data Anonymization over Big Data using Mapreduce on Cloud. i-manager’s Journal on Cloud Computing, 2(2), 31-39. https://doi.org/10.26634/jcc.2.2.3449

References

[1]. Xuyun Zhang, Laurence T. Yang, Chang Liu and Jinjun Chen,(2014). “A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization using MapReduce on Cloud”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 25, No. 2, pp. 263-373, ISSN: 1045-9219. (A*, IF: 1.796)
[2]. Wanchun Dou, Xuyun Zhang, Jianxun Liu and Jinjun Chen, (2013). “HireSome-II: Towards Privacy-Aware Cross- Cloud Service Composition for Big Data Applications”, IEEE Transactions on Parallel and Distributed Systems (TPDS), in press, (A*, IF: 1.796), No. 1, pp. 1.
[3]. S.Chaudhuri, (2012). “What Next? A Half-Dozen Data Management Research Goals for Big Data and the Cloud,” Proc. 31st Symp. Principles of Database Systems (PODS '12), pp. 1-4.
[4]. M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A.Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M.Zaharia,(2010). “A View of Cloud Computing,” Comm. ACM, Vol. 53, No. 4, pp. 50-58.
[5]. L. Wang, J. Zhan, W. Shi, and Y. Liang, (2012). “In Cloud, Can ScientificCommunities Benefit from the Economies of Scale?,” IEEE Trans.Parallel and Distributed Systems, Vol. 23, No. 2, pp.296-303.
[6]. H. Takabi, J.B.D. Joshi, and G. Ahn, “Security and Privacy Challenges in Cloud Computing Environments,” IEEE Security and Privacy, Vol. 8, No. 6, pp. 24-31.
[7]. D. Zissis and D. Lekkas, (2011). “Addressing Cloud Computing Security Issues,” Future Generation Computer Systems, Vol. 28, No. 3, pp. 583-592.
[8]. X. Zhang, C. Liu, S. Nepal, S. Pandey, and J. Chen, (2012). “A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of Intermediate Data Sets in Cloud,” IEEE Trans. Parallel and Distributed Systems, Vol. 24, No. 6, pp. 1192-1202.
[9]. L. Hsiao-Ying and W.G. Tzeng, (2012). “A Secure Erasure Code-BasedCloud Storage System with Secure Data Forwarding,” IEEE Trans.Parallel and Distributed Systems, Vol. 23, No. 6, pp. 995-1003.
[10]. N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, (2011). “Privacy Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data,” Proc.IEEE INFOCOM, pp. 829- 837.
[11]. P. Mohan, A. Thakurta, E. Shi, D. Song, and D. Culler, (2012). “Gupt: Privacy Preserving Data Analysis Made Easy,” Proc. ACMSIGMOD Int'l Conf. Management of Data (SIGMOD '12), pp. 349-360.
[12]. K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, (2008). “Workload-Aware Anonymization Techniques for Large-Scale Data Sets,” ACM Trans. Database Systems, Vol. 33, No. 3, pp. 1-47.
[13]. T. Iwuchukwu and J.F. Naughton, (2007). “KAnonymization as SpatialIndexing: Toward Scalable and Incremental Anonymization,” Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), pp. 746-757.
[14]. W. Jiang and C. Clifton, (2006). “A Secure Distributed Framework for Achieving k-Anonymity,” VLDB J., Vol. 15, No. 4, pp. 316-333.
[15]. P. Jurczyk and L. Xiong, (2009). “Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers,” Proc. 23rdAnn. IFIP WG 11.3 Working Conf. Data and Applications Security XXIII (DBSec '09), pp. 191-207.
[16]. I. Roy, S.T.V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel, (2010). “Airavat: Security and Privacy for MapReduce”, Proc. Seventh USENIX Conf. Networked Systems Design and Implementation (NSDI'10), pp. 297- 312.
[17]. Xuyun Zhang, Chang Liu, Surya Nepal, Wanchun Dou and Jinjun Chen, (2012). “Privacy-preserving Layer over MapReduce on Cloud”, presented at the 2nd x International Conference on Cloud and Green Computing (CGC 2012), pp. 304-310, Xiangtan, China.
[18]. Candan KS, Kim JW, Nagarkar P, Nagendra M, and Yu R (2010). “RanKloud: scalable multimedia data processing in server clusters”, IEEE MultiMed, Vol. 18, No. 1, pp. 64–77.
[19]. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrws M, Chandra T, Fikes A, Gruber RE (2006). “Big table: a distributed storage system for structured data”, In: 7th UENIX symposium on operating systems design and implementation, pp 205–218.
[20]. Dean J, Ghemawat Dean S (2008). “MapReduce: simplified data processing on large clusters”, Commun ACM 51, pp. 107–113.
[21]. Liu H, Orban D (2011). “Cloud MapReduce: a MapReduce implementation on top of a cloud operating system”, In: IEEE/ACM international symposium on cluster, cloud and grid computing, pp. 464-474.
[22]. Amazon Web Ser vices, “Amazon Elastic Mapreduce,”http://aws.amazon.com/elasticmapreduc e/, accessed on: Jan. 05, 2013.
[23]. B.C.M. Fung, K. Wang and P.S. Yu, (2007). “Anonymizing Classification Data for Privacy Preservation,” IEEE Trans.Knowl..Data Eng., Vol. 19, No. 5, pp. 711-725.
[24]. J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W. Fu., (2006).“Utility based anonymization using local recoding”, In ACM SIGKDD, pp. 785-790.
[25]. Brodsky A, Farkas C, Jajodia S (2000), “Secure databases: Constraints, inference channels, and monitoring disclosures”, IEEE Transactions on Knowledge and Data Engineering, Vol.12, pp. 900–919.
[26]. D. Polemi, (1998). “Trusted third party services for health care in Europe”, Future Generation Computer Systems, Vol. 14, pp. 51–59.
[27]. N. Cao, C. Wang, M. Li, K. Ren and W. Lou, (2011). “Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data,” Proc. 31st Annual IEEE Int'l Conf. Computer Communications (INFOCOM'11), pp. 829- 837.
[28]. UCI Machine Learning Repository, ftp://ftp.ics.uci.edu/pub/machine-learnng-databases/,
[29]. L. Hsiao-Ying and W.G. Tzeng, (2012). “A Secure Erasure Code-Based Cloud Storage System with Secure Data Forwarding,” IEEE Trans. Parallel and Distributed Systems, Vol. 23, No. 6, pp. 995-1003.
[30]. V. Borkar, M.J. Carey, and C. Li, (2012).“Inside Big Data Database Technology (EDBT’12), Management’: Ogres, Onions, or Parfaits?,” Proc. 15th Intl Conf. Extending Database Techonology (EDBT’12), pp.3-14.
[31]. Xuyun Zhang, Wanchun Dou, “Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud”, IEEE transactions on Computers, Vol. 64, No. 8, pp. 2293- 2307.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.