A comparative study of secure index based on Context Based Fertility (CBF) for ranked search over encrypted data is performed in this paper. In daily requirement, there is an increasing in need for secure searched encrypted data in cloud computing. This paper mainly provides an approach to overcome the attacks and challenges that are occurred in the process of encryption by mentioning various methodologies proposed till date. In accordance with the frequent updating of technology in today's world, the secure index based on CBF for ranked search over encrypted data with few goals like Multiple Ranked Keywords Search, Guaranteed Security, Data Confidentiality, Indexed Privacy, Keyword Privacy and Efficiency faces many challenges in fulfilling their goals.
With the increase in usage of cloud to store high amount of data, it helps in reducing the burden of data storage to the owner by utilizing the storage facility provided by the cloud. As the key elements including data user, data owner and server doesn't reside in the same trustworthy atmosphere, it results in loss of confidentiality of the data. To prevent such loss and to protect the data, the data is encrypted before outsourcing it. The sensitive data should be encrypted to protect data privacy as it increases the heavy computational overhead which in turn brings substantial challenges to resource constrained devices (Song et al., 2000). As the data retrieval and utilization of effective data in encryption is a challenging task, the first practical technique introduced by Song et al. (2000) allows the users to securely search over encrypted data through keywords (Pitchay et al., 2015). Later, many searchable schemes came into existence based on symmetric key and public key setting to strengthen security and to improve query efficiency. The query privacies of user get leaked to few internal entities in setting of symmetric encryption, as there are limitations in working of query trapdoors. More and more organizations and users are attracted by its low-cost and high-quality services to the cloud server in outsourcing their data. Some of the most popular services that drew a lot of attention these days are data analysis and prediction, including medical prediction, risk assessment, recognizing the image and spam detection. Towards protecting data privacy and to prevent unauthorized access, the basic approach is encrypting the sensitive data. This challenges the effective utilization of data (Yin et al., 2017). The challenge intensifies when utilizing the data based keyword search on encrypted data. Meharwade and Patil (2016) proposed that the traditional plain text retrieval can be performed by providing ranked search for the data. Furthermore, to protect the data and to provide data privacy in search, the cloud server should adopt a similar function over the data. The major challenges in search over encrypted data are strict requirements such as data privacy, index privacy, keyword privacy and many others along with security obstacles. New developments in telecommunications leading to the use of 4G technologies and the development of 5G technologies have unfolded many of the other perspectives. The lifestyle of people has changed enormously due to the mobile and wireless devices. Mobile applications such as health and location information in cloud computing feed in massive amounts of sensitive data onto the cloud server every second. This poses heavy computational requirements to search over encrypted data generated by mobile devices. Therefore, secure index based search method has great potential to simultaneously provide an ability of searching on the encrypted data and protecting the data privacy in cloud computing environments connecting mass mobile devices.
A review about the types of encryption techniques used, their methodologies, problems encountered and different solution mechanisms proposed to reduce the effectiveness of some of the challenges faced during encryption are discussed in this section. The focused problems are divided into two different categories, mainly dealing with privacy-preserving training and privacypreserving classification. Although encryption service provides security to the encrypted data, the developers have to choose from when being asked to “encrypt our sensitive data please” due to occurrence of lot of variations. To explain privacy preserving training the clientserver model can be taken as an example where the problem is how to enable a learner to train a machine learning model over the datasets collected from data providers with the content of datasets not getting revealed. As per the privacy-preserving classification, the existing works in the client-server model consider it where a classifier owner holds a classifier model that is trained by some means, while a user holds his/her data that will be predicted. The challenges observed when providing security to the encrypted data while focusing on the above two categories are mainly,
Hereby, the various encryption techniques and their methodologies to overcome the challenges listed above along with the complications are elaborated.
Figure 1 illustrates the retrieval service over encrypted cloud data. It encompasses three different entities: Data Owner (DO), Cloud Server (CS) and Authorized User (AU). Data owner is a cloud user who has a collection of documents D that contains privacy information uploaded on the cloud server. Since the cloud server cannot be fully trusted and to provide basic security on cloud, D must be encrypted into the ciphertext C prior to uploading it onto the cloud server. In order to prevent any kind of information leakage affecting document privacy, the data owner goes for keyword retrieval by the cloud server to avoid unauthorized access, where an index will be created and encrypted after which it gets uploaded to the cloud server along with the ciphertext C by the data owner for protecting data privacy. After this, the authorized user takes complete access regarding the functioning of the keyword retrieval service.
Figure 1. Search over Encrypted Data (Yao et al., 2016)
The authorized user generates a trapdoor by making use of keywords and sends it to the cloud server. The cloud server compares it with the encrypted index returning the matched encrypted documents which also gets ranked to improve retrieval accuracy. Finally, the authorized user decrypts the document C into D and makes use of it as per requirement. In this process, all the information regarding encrypted data is known to the cloud server where the confidentiality of the data is ensured by the traditional cryptography. The requirements indulged in retrieval procedure makes it a difficult task.
The following are some of the heterogeneous types of encryptions.
These can also be termed as symmetric or secret key algorithms. In these algorithms, both encryption and decryption is performed by using the same key.
These are also known as asymmetric algorithms. In public key algorithms, two different keys are used; one for encryption and one for decryption.
These are alike of Triple Data Encryption Standard (3DES) in which they encrypt the data, a block at a time.
This is a symmetric algorithm which makes use of a keystream, a series of random numbers to encrypt the plain text to the cipher text, one character at a time.
A form of public-key encryption which is practically indestructible for normal computers (Rivest et al., 1978).
It is a distributed database, essential for block chain technology which is best known for the basis as a Bitcoin in turn uses cryptography in storing the data about financial transactions. It is a form of cryptocurrency using public key encryption (Rivest et al., 1978).
Few popular algorithms in encrypted data include Advanced Encryption Standard (AES), Rivest-Shamir- Adleman (RSA), International Data Encryption Algorithm (IDEA) , Blowfish and Two-fish, Signal Protocol and Ring Learning with Errors (Song et al., 2000). There are many encryption algorithms proposed based on the availability of earlier computer communications. Encryption algorithms are designated based on their working principles. The standard encryption algorithms are AES, Twofish, DES, and RSA which is a category of public key found on the basis of cryptography implementations.
By comparing the types of various encryption techniques, Table 1 provides the user an easy access to choose a particular encryption technique to secure his data. So as per the above comparisons performed, AES can be used for flexible encryption in small devices whereas IDEA can be used in worldwide banking and industrial applications. RSA helps in providing a secure communication. DES and double DES failed to be a secure technique due to some of the obtained analytical results, whereas the triple DES succeeded in dealing with the problems of DES and double DES, along with fixing the security issues. Twofish contains flexible designs and doesn't contain any weak keys whereas blowfish contains some weak keys but there is no successful attack against it till date. Both blowfish and twofish are freely accessible to everyone.
The existing solutions for resolving the complications faced during the performance of these various encryption techniques are discussed so far. Some of the existing mechanism techniques to overcome the problems that are encountered while providing security to the encrypted data while performing the encryption techniques are listed below.
In this technique, the user can search over the encrypted data using keywords without decrypting it. Searchable encryption technique works on the basis of symmetric-key version and asymmetric-key version. As the searchable encryption based on symmetric-key version proposed by Song et al. (2000), which is a cryptographic technique, failed in providing accurate results, to overcome this, Public key Encryption with Keyword Search (PEKS) has been proposed by Boneh et al. (2004) for handling searching issue in encrypted data by using asymmetrickey version which is mainly applied in mail gateway for filtering the predefined keywords related mails.
This technique inflates system usability by enabling search result relevance ranking instead of sending uncommon results by further ensuring the file retrieval accuracy. Ranking of files is done on the basis of number of times the keyword appears in the file where the files have more word related data with low rank which aren't considered (Wang et al., 2012; Meharwade and Patil, 2016).
This technique is used for providing secure image search and privacy protection. In terms of accurate search, security power, and reckoning efficiency, this technique is more secure but due to heavy-weight in terms of computational complexity, communication tool, and user involvement for practical index, results in security issues (Lu et al., 2014).
This technique is a secure ranked keyword search scheme in which the inverted index gets combined with the OPSE proposed by Wang et al. (2012).
It is a scheme based upon secure inner product summation; a basic idea for privacy-preserving multiple keywords ranked search over encrypted cloud data (MRSE) which was proposed by Cao et al. (2014).
This scheme was proposed by Yu et al. (2013) that supports top-k multi-keyword retrieval. As the existing homomorphic encryptions have huge algorithmic intricacy, large key size and expansion of cipher text results in TRSE becoming inefficient for practical utilization on mass encrypted data (Yu et al., 2013).
Sahai and Waters (2015) proposed this technique in which they stated that several attributes including boolean expressions get referred to as one's identity and described how all the expressions combine together forming an access policy. Although some of the other searchable encryption schemes like Inner Product Encryption (IPE) and Hidden Vector Encryption (HVE) have similar properties like ABE , they differ from it as they use one-tomany properties which is not safe and is restricted to use in these variations (Koo et al., 2013).
This technique has turned to be a viable tool to tackle the problem of fine-gained access control which can gain one-to-many encr yption rather than one-to-one, because each user has different access privilege. It is still a challenge to devise a suitable access control over same content (Koo et al., 2013; Miao et al., 2017).
This technique describes about the access policy being associated with the keys corresponding to attributes implying descriptive attributes for the data authorized by the encryptor's choice, but grant of access is accredited to the encrypted content (Koo et al., 2013).
Ranked Multiple Keywords Search, which is used to design the effective multiple keywords search scheme, supports the result relevance ranking for the encrypted data. It also helps in providing secure keyword search by making sure with elements like security guarantee, security requirements and efficiency (Sahai and Waters, 2005).
By means of the above data given, the searchable encryption being a cryptographic technique couldn't provide accurate results but they were eased by using the PEKS. RSSE played an effective role in solving keyword related issues. OPSE succeeded in implementing a secure keyword search. MRSE is used for privacy preserving ranked keyword search whereas TRSE became inefficient in some of the cases, due to the homomorphic encryption giving a secure image search which is having a large complicated algorithm. ABE, CP- ABE and KP-ABE individually failed implementing some of the functioning aspects. Here it is concluded that RSSE based on ranked multiple keyword search succeeded in fulfilling its functions compared to other proposed techniques.
Some of the advances related to security, which has a largest driver of IT security today are also mentioned, as follows.
In this paper, a detailed description regarding the encryption process and the challenges, security issues, attacks faced by the encrypted data in the process of encryption were provided. As per the entities present in the existing system, there might be chances of insecurity for the data to be encrypted. Developer should avoid leaving the key unprotected, fetching the key insecurely, using the same key for all the data and maintaining the same key for a long period of time. It is mandatory to permit keyword search request and return the documents by studying the large number of data users and documents in the cloud which are used to make easy performance of traditional data utilization based on keyword search over encrypted data. Therefore, by comparing the types of various encryption techniques, it provides the user an easy access to choose a particular encryption technique to secure his data. The authors discussed about the problems faced during the performance of these various encryption techniques and the solutions proposed so far in resolving the complications.
The authors express their gratitude towards the assistance provided by Accendere Knowledge Management Services Pvt. Ltd. in preparing the manuscripts. We also thank our mentors and faculty members who guided us throughout the research and helped us in achieving the desired result.