Abstract

Cryptanalysis is one of the most challenging research areas in the field of information security. Often, this includes how to find the key which has been used for hiding the message and thus to arrive with the original information. In order to avoid others' attacks, one should first have enough knowledge and experience of the existing cryptanalytic attacks on various cryptographic systems. These attacks and their avoidance requirements can be described based on information available to opponent, computational time requisites, memory requirements, etc. Security analysis of the existing ciphers is very helpful to better understand the requirements for designing secure and efficient ciphers. This paper’s main objective is to propose a design for a general cryptanalysis platform for pedagogical purposes. Besides educational benefits expected on information security side, other benefits of practicing with certain software development methods will also be investigated. The whole work can be considered to be under the general title of ethical hacking. In order to make a solid ground for the research, the paper starts by surveying different cryptanalysis techniques for various cipher systems. The paper also reports on the progress of the authors’ ongoing work.

Cryptography (which means in Greek “secret writing”) is the practice and study of (mathematical) techniques for secure communication in the presence of opponents or adversaries. Modern cryptography can be considered to be at the intersection of mathematics, computer science, and electrical engineering disciplines. On the other hand, cryptanalysis (or code breaking) is based on the knowledge of the encryption algorithm in addition possibly to some knowledge of the plaintext structure. The ultimate goal of cryptanalysis is to infer the key for decryption of future messages. Using a specific method for cryptanalysis depends on several factors. These include whether the adversary has access to ciphertext only, or has pairs of plaintext and ciphertext, how much structure in the plaintext, and how much of that structure is known to the attacker (Stallings, 2014). Thus, cryptography is important to make information safe and immune to attacks where cryptanalysis is concerned with breaking of codes. Both cryptography and cryptanalysis constitute the boarder field of cryptology. Figure 1 represents the relation between cryptography and cryptanalysis (Kahate, 2009).

In addition to cryptanalysis approach, the brute-force attack can sometimes be applied also. In this latter case, the attacker tries every possible key on a piece of ciphertext in order to turn the cipher text into plaintext even albeit took years to achieve this goal(Toemer & Arumugam, 2007). Of course, this greatly depends on the importance of information and available computational power. Half of the key space must be tried for success on average. Based on the amount of information available to the cryptanalyst, various types of possible cryptanalytic attacks are overviewed in Table 1. Typically, the most difficult cryptanalysis challenge is presented when all that is available is only the ciphertext. This is the most complicated attack among the classes of attacks encountered in cr yptanalysis (Stallings, 2014; Vimalathithan & Valarmathi, 2009).

In this paper, the authors report their open project for developing a general cryptanalysis platform. Building such a platform is a big software project and real research challenge. The project is intended for pedagogical purposed and expected to continue for several years (at least). The project is planned to be done based on graduate (M.Sc. and Ph.D.) students' research projects. They are expecting one or two students to work in the project each year on average. Thus, suitable software development strategies and tools have to be adopted and investigated. This is the first paper to be published on the project. Thus, it is focused on the general project considerations, architectural platform design, and main project development techniques and phases.

The remaining of the paper is organized as follows: Section 1 is an overview of some related works. The most important cryptanalysis techniques that are expected to be considered in the project are reviewed in Section 2. Then, software development approaches are summarized in Section 3. Next, main project considerations, choices, and phases are presented in Section 4 before the general architectural design of the cryptanalysis platform is introduced in Section 5. Finally, the paper is concluded in Section 6 with some highlights on the future work.

There are many important research works related to cryptanalysis. However, as far as our knowledge, the authors are unaware of any previous published work on designing a general cryptanalysis platform, especially in accordance with the adopted approach. In this section, some sample previous works on cryptanalysis are reviewed.

Li, Zhang, and Chen (2008) studied of the security of an image encryption scheme based on the Hill cipher and presented a cryptanalysis approach for such a scheme. Their approach makes benefits of the defects inherent in that encryption scheme. Hence, they were able to break the cipher by using only one known/chosen-plaintext attack. Arroyo, Li, Li, Alvarez, and Halang (2009) considered the cryptanalysis of an image encryption scheme based on total shuffling algorithm. They exploited some defects of the scheme and show how to break it with a chosen-plaintext attack. Then, Arroyo, Alvarez, and Fernandez (2009) introduced a basic framework of cryptanalysis in the context of digital chaosbased cryptography. The framework included a series of mathematical tools. The dynamical properties were considered in this work to show how they can be used in the cryptanalysis.

Mokhov (2010) considered the algorithms of automated optimization heuristics for cryptanalysis of classical ciphers (e.g. genetic, simulated annealing, and tabu search). For verification and comparison goals, they imp lemented a Java-language open-source cryptanalysis framework called Cryptolysis. Another Javalanguage open-source project, MARF, had collected a number of framework classification algorithms (e.g. distance, neural network, similarity measure, etc.). MARF's implementation was improved by porting the cryptanalysis implementations of search algorithms for various classification tasks in natural language and others. The proposal had been developed by using the natural language word segmentation for the deciphered text corpora that lacks spacing and punctuation.

Pasalic (2009) presented a comparative study between the performance of probabilistic algebraic attacks and the classical algebraic attacks through the context of their application to certain linear feedback shift register (LFSR)-based stream ciphers. Also, Bechtsoudis and Sklavos (2010) studied side channel attacks cryptanalysis against block ciphers. They noticed that the secret information would not usually be manipulated in close and reliable computing environments as designers assumed. Real computing units and chips have implementation information leakage through their operation. Thus, the side channel cryptanalysis could use this implementation data. They also analyzed different categories of side channel attacks and studied how to concrete such attacks against FPGA based implementations.

Luthra and Pal (2011) studied the cryptanalysis of monoalphabetic substitution ciphers. Monoalphabetic ciphers exchange every letter in the plaintext with a different letter in accordance with some predefined rule. The identification of the ciphers cryptanalysis scheme was done by using known language statistical data. They discussed the integration of the operators of mutation and crossover that are commonly used in both of the genetic algorithms and the Firefly algorithm in order to cryptanalyze the monoalphabetic substitution cipher.

Vimalathithan and Valarmathi (2009) presented an approach based on genetic algorithm for the cryptanalysis of Simplified DES. When putting time difficulty of the proposed approach side by side to brute force attack, the time had been reduced drastically. In addition, Soleimany, Sharifi, Bahrak, and Aref (2009) studied a cryptanalysis technique of 7-round AES-128. The proposal was based on using particular possessions of Mix Column operation of AES. More seriously, Biryukov and Khovratovich (2009) presented two kinds of related-key attacks on the full AES cipher. Both attacks were boomerang attacks, which had been improved with the boomerang switching techniques to get free rounds in the middle.

As mentioned previously, classification of attacks can generally be done based on the amount of information available to attacker. It is also possible to classify attacks in general on the basis of their computational resource requirements. These resources are time, memory, and data. Note that it is not always easy to deduce the required quantity of plaintext or ciphertext precisely, particularly when the attack is not practical (Stallings, 2014). However, in this section, the authors’ review the most important cryptanalysis techniques that be considered for implementation in this project. These can be divided on the basis of type of cryptographic system they usually used to attack. Thus, they consider the most important attacks on symmetric (conventional) ciphers (both block and stream ciphers), asymmetric ciphers (public-key cryptography), and cryptographic hash functions, respectively.

2.1 Cryptanalysis of Symmetric Ciphers

There are various kinds of attacks that can be applied on symmetric (secret-key) systems. The most important of them are given below:

2.1.1 Differential Cryptanalysis

This is basically a chosen plaintext attack by which the relationship between the ciphertexts created by two related plaintexts is discovered. Hence, this kind of attack focuses on the statistical analysis of two inputs and two outputs. The attack searches the signs of nonrandomness in each pair of intermediate ciphertext pairs. More details on this important category of attack can be found in (Kahate, 2009; Led, Massey, & Murphy 1991; Nyberg & Knudsen, 1995; Wenling & Dengguo, 2009).

2.1.2 Linear Cryptanalysis

This is a known plaintext attack that focuses on statistical analysis in opposition to one round of decryption on big number of ciphertexts. It needs to access to a large quantity of plaintext and ciphertext pairs which have been encrypted with indefinite keys. The cryptanalyst studies the resulting intermediate ciphertexts to look for the smallest amount random result and using all probable sub-keys decrypts each ciphertext for one round of encryption. For example, the reader may refer to (Kahate, 2009; Wenling & Dengguo, 2009).

2.1.3 Boomerang Attack

This is a technique for cryptanalysis of block ciphers which is based on differential cryptanalysis. This type of attack provides different avenues of attack on various ciphers which are safe from the basic differential cryptanalysis. The attack permit “differentials” cover only part of cipher. In order to produce the so called "quartet”, this attack is applied at the point halfway through the cipher (Wenling & Dengguo, 2009).

2.1.4 Integral Cryptanalysis

This category of attack is appropriate on block ciphers based on substitution-permutation networks. In dissimilar differential cryptanalysis, the Square attack uses sets or multi sets of selected plaintext of which part is held stable and other part varies with all possibilities (Stallings, 2014; Wenling & Dengguo, 2009).

2.1.5 Davies' Attack

This is a known plaintext attack which is based on nonuniform division of output of pairs of adjacent S-boxes in DES. For attacking the DES, Davies' attack works by calculating empirical distribution of its characteristics and gathering various plaintext/ciphertext pairs (Stallings, 2014).

2.1.6 Man-in-the-Middle Attack

This attack is a well-known plaintext attack. In cases that many keys are used for encryption, this kind of attack can be applied. The attacker in this type has access to both the plaintext and resulting ciphertext. For example, this attack can be applied against Double DES and other similar double cipher constructions (Stallings, 2014) .

2.2 Cryptanalysis of Asymmetric Systems

In asymmetric (or public-key) cryptography, one key (the private key) is used for decryption and another key (the public key) is used for encryption (or vice versa). The major point of attack is to develop techniques or algorithms to solve the “assumed hardness” of the mathematical problem used to construct the public-key system. Hence, security of public-key cryptography depends on mathematical inquiries in a way that conventional cryptography does not. For example, RSA's security depends on complexity of integer factorization, while the security of Diffie-Hellman key exchange depends on calculating the separate logarithm. Another major characteristic of asymmetric ciphers is that cryptanalyst has a chance to make use of knowledge gained from the public key (Stallings, 2014).

2.3 Cryptanalysis of Hash Systems

The most representative attack on cryptographic hash functions could be the birthday attack. Birthday attack is an attack which can find out collisions in the hashing algorithm. Birthday paradox is a base for this attack. This kind of attack is generally used on hash algorithm such as SHA1 and MD5 (Kahate, 2009; Mao, 2004).

2.4 Side Channel Attacks

This kind of attack is done by using additional information obtained from the physical implementation of the cipher. This is mainly related to the hardware used to encrypt or decrypt the data. Attacker in other cryptographic attacks has access to plaintext or ciphertext pairs or cryptographic algorithm. Side channel attack analysis is based on algorithm realization. It uses the leakage of confidential information that may be found, like energy consumption, running time, running errors. It represents a mainstream of the recent block cipher analysis techniques (Kahate, 2009).

Organizations usually require their software be updated and meet all the business requirements due to the quick development in technology. Because of the change in the business process, organizations have to have their business software on time within the budget. As the organizational necessities change, this produces challenging problems for the development team since it becomes very complex and expensive to create changes in the middle of the development phase and so the software might become over budget and late (Awad, 2005; Czarnacka-Chrobot, 2010). There are two types for software development methodologies: traditional (or heavyweight), and agile (lightweight) methodologies.

The most representative examples of traditional software development methodologies include:

Unified Process:

An iterative and incremental method is used in all attempts, including modeling. It is ordered into workflows in the Unified Process (UP) (Devi, 2013) .

Waterfall:

Increase development and reuse oriented software engineering are traditional models for software development similar to waterfall model. The waterfall software was the initial model that was presented in the 1970s for software development to explain the software engineering practices. The sequential model for software engineering is similar to waterfall such that the software development stages are in a sequence (Royce, 1970).

Spiral Model:

This model combines elements of both design and prototyping-in-stages in order to obtain the advantages of both top-down and bottom-up approaches. The spiral model usually is applied to big software projects. For more details see(Boehm, 1998).

Many people believe that many problems in the software development can be defeated by using flexible software development ways that can afford any changes at any stage of the software development. Hence, agile development is very suitable for the organizations to obtain the benefit from the agile ways because it is very flexible for changes (Devi, 2013). The major purpose of an agile software engineering is to supply the customers' quality products with no faults and that can cover all the expectations of the final users. A number of Software Development Life Cycle (SDLC) methodologies have been proposed to get those goals (Manjunath, Jagadeesh, & Yogeesh, 2013). Some of the most important agile software development techniques are reviewed in the following subsections.

3.1 Extreme Programming (XP)

It is an interesting agile technique for software development known for its best practices. After a number of successful trails on the preceding XP practices, the idea of XP is introduced. XP is active, low risk, welcomes changes, and has strong oral communication, pair programming, automated test, group code ownership and established story telling culture (Beck, 1999; Juric, 2000; Nawrocki, Jasinski, Walter, & Wojciechowski, 2002).

3.2 Adaptive Software Development (ASD)

This kind of software development presents a lightweight and adaptive technique for software projects with highly changing requirements. It is not usually probable to prepare successful and complete plan in a rapidly changing and unpredictable environment. Thus, ASD replaces the traditional life cycle planning and design life through the use of a dynamic speculate-collaboratelearn life cycle (Highsmith, 2000).

3.3 Scrum

The software development environment has an enormous change in the modern years. In case that important changes are expected during the product life cycle, and in order to deliver the software product with high-quality, in a quicker way, and according to the required standards, it might be necessary to use a software development process like Scrum. For producing the required system flexibility in a changing environment, the basic concentration of Scrum is on how the team members should work together (Rising, & Janoff, 2000).

3.4 Dynamic System Development Method (DSDM)

DSDM is a mix of extension, quick application development and iterative development performs that was developed in the United Kingdom in the mid-1990s. The basic idea behind DSDM is to repair time and resources, and then regulate the amount of functionality accordingly rather than fixing the quantity of functionality in a product, and then adjusting time and resources to get to that functionality (Stapleton, 1997). It is possible to divide DSDM into five stages: feasibility study, business study, functional model iteration, design and build iteration, and implementation, as shown in Figure 2.

3.5 Feature-Driven Development (FDD)

It is usually reported that FDD is built around eight finest practices. These practices are: domain object modeling; developing by characteristic; regular builds; individual class ownership; configuration management; feature teams; inspections; and reporting/visibility of results. Indeed, UML models are used widely in FDD (Palmer & Felsing, 2002). The basic structure of FDD is composed of five incremental and iterative processes that are: developing the overall model, building the features list, planning by feature, designing by feature, and finally building by feature (See Figure 3). In order to limit the total amount of time spent over, guidelines are known for the quantity of time that should be exhausted in each process. Processes 1 through 3 are achieved at the beginning of a project and then updated during the development cycle. Processes 4 and 5 are implemented incrementally on short periods of time basis. Each of these processes has its relevant input and output criteria. Table 2 presents some general features and comparison of agile methodologies (Awad, 2005).

This is a non-traditional open-ended project for developing a general cryptanalysis platform for pedagogical purposes. The most important peculiarities of the project are:

For all these reasons, traditional software development methodologies are not suitable for such a project. The alternative is to build the project based on agile methods. The authors’ choice is to integrate both FDD and DSDM agile software development methods for development of this project. The reasoning behind this choice is as follows:

The project can be roughly divided into four main phases: general modeling and planning phase, pilot project phase, detailed design phase, and implementation and maintenance phase. Following are some details and expected timeline for these phases:

4.1 Phase I- General Modeling, Design, Planning Phase

In this initial phase, the general feature-driven based model and design is developed. A general project plan is set up and main project considerations are studied and analyzed. This paper actually is a work-in-progress reporting on this phase. The time period for this phase is six months.

4.2 Phase II- Pilot Project Phase

In order to reach better understanding of the details and requirements of the whole project, a relatively small pilot project will be considered at first. This pilot project is related to “cryptanalysis of classical ciphers”. It is wellknown that cryptanalysis of classical ciphers is much less challenging than the cryptanalysis of modern ciphers. However, systematic design procedure and suitable development techniques should also be followed in this pilot project. This would give them the opportunity to test their ideas and prepositions on a relatively smaller scale level. This pilot project is to be done on two stages. The first stage is to develop a simple working prototype for classical ciphers' cryptanalysis. This stage is expected to be finished within six months. The same student(s) who works on the first phase will also be responsible for the work required in the first stage of the second phase. In the second stage of this phase, a more sophisticated cryptanalysis platform for classical ciphers is to be built. The time period expected to this second stage is one year.

4.3 Phase III- Detailed Design Phase

After finishing the pilot project, more accurate knowledge and better experience are expected to be gained by them on the whole project challenges and requirements. This would enable them to consider the design of specific parts and details of the project in a more sophisticated manner. Proper integration of FDD and DCDM methods will be proposed in this phase and most relevant tasks of the project will be considered. The time period allocated for this phase is one year.

4.4 Phase IV- Implementation and Maintenance Phase

The time period required for this final phase will be larger than any other phase, as this phase might continue for several years. A modular approach will be followed in this phase so that modules with various functionalities are going to be added to the platform one by one. It is anticipated that this project will be open-ended and new experience would be gained after the development of each module or the implementation of each task. More details on the tasks to be implemented in this phase can be found in the next section.

This section is dedicated for presenting the basic parts and aspects of the proposed architecture of the general cryptanalysis platform. The most important design concepts leading this architectural design are:

5.1 Modularity

Different functionalities required must be efficiently divided among various modules. Maintenance and upgrade of any module should not affect the work of other modules. Thus, we can respond to any future changes in requirement with minimal change on related modules only. Doing major holistic changes on the platform is not a choice.

5.2 Parallelism

Parallelism must be enforced on various levels and components of the system. This is very necessary for this platform since launching successful cryptanalysis attacks usually requires massive computation. Therefore, parallelism is crucial for reducing the time requirements.

5.3 Expandability

It is necessary for such open-ended project to allow easy expansion and extension so that future functionalities can be added to the system. The modular design is a corner requirement in this direction.

5.4 Clear Interfaces

Since different people (students) are expected to work on the project during its evolution, it is important to have a clear interface design among various components of the system. Input and output formats of each component (module) should be clearly defined from the beginning.

5.5 Possibility of Hybrid Software/Hardware Implementation

In spite of the fact that the proposal is basically and mainly focused around a software platform, it is quite possible in the future to consider other implementation possibilities for some parts or modules of the platform. These possibilities might include choices like FPGAs, supercomputing facility, and cloud computing based implementations. Of course, this can only be considered during advanced project stages, however, we need to plan for such hybrid architecture possibilities right from the beginning.

Figure 4 shows a schematic for the architectural design of the proposed general cryptanalysis platform. The most important components or modules involved in this architecture are:

5.5.1 The Supervisory Module

This is the main system module that is controlling and supervising other modules and components. High reliability of the module should be assured via proper design procedure, safe implementation choices, and additional software/hardware redundancy. The security of the whole platform should be enforced starting from this module.

5.5.2 Level-1Crypto-Classifier

In this module, a first level classification of the considered ciphertext needs to be done so as to decide the general cryptographic category (e.g., classical cipher, block cipher, public-key cipher, etc.) of it. Information obtained from various resources need to be used and some intelligent classification techniques (like AI, genetics, and neural networks) have to be developed.

5.5.3 Level-2 Crypto-Classifier

In the second level of classification, specific algorithm(s) or cipher(s) should be assigned for the ciphertext in accordance with the classification done at the first level. For example, if the classifier of level-1 deduced that the ciphertext belongs to the category of block ciphers; level- 2 classifier job is to decide which specific block cipher has been used (e.g., DES, AES, Twofish, etc.). Besides the information deduced by different means, some distinguishing characteristics for different ciphers must be known. Note that in many cases, the cryptographic algorithm used is publically known. This might be helpful. However, this is not always the case and we need to keep this design as general as possible.

5.5.4 Parallel Cryptanalysis Modules

Here is the core of the cryptanalysis platform, where actual cryptanalysis attacks are implemented. Many of such modules need to be developed based on the number of cryptanalysis techniques (e.g., differential cryptanalysis, linear cryptanalysis, brute-force attack, etc.) considered. Parallelism must be highly enforced in these modules.

5.5.5 Feedback and Reporting Module

This module collects relevant data from other modules and sends feedback and report information to the supervisory module. Feedback is necessary to enhance the performance of the platform as a whole or in parts and fix any problems. Proper reporting is crucial for such projects.

5.5.6 Natural Language Module

It is important to decide the original language (e.g., Arabic, English, French, etc.) of the plaintext so that one accurately can check the successfulness of the launched attacks. In many cases, it is easy to guess the plaintext language. However, in other cases we need to develop some intelligent techniques for this purpose. Even after deciding the plaintext language, many other language issues related to styles, characters, and symbols used are still to be resolved.

5.5.7 Graphical Analyzer Module

It is believed that offering representative graphical interface for the partial/intermediate results for many cryptanalysis attacks can be helpful to understand the progress of these attacks at early stages. Thus, one can decide to continue the attack or abort it, fix it, and restart again. As these attacks usually are time consuming, this can save significant time. In addition, this module is very useful in the pedagogical settings.

5.5.8 The Steganography Module

In many information security systems, cryptography is integrated with steganography. This typically increases the difficulty of attacks. Attacking steganographic systems usually requires techniques that are different from those used for attacking crypto-systems. The responsibility of this module is to address such issues. However, as cryptanalysis is the main theme, the implementation of this module will be delayed to advanced stages of the platform. Indeed, it is expected that this module will need to be expended to several other modules during actual implementation.

This paper represents a work-in-progress report on the open-ended pedagogical project on the development of a general cryptanalysis platform. The paper is dedicated to report on the early phase of the project which is mainly concerned on the general design, modeling, and planning considerations. A variety of attacks and cryptanalysis techniques have been reviewed. This is crucial as the awareness of various types of attacks is crucial for making the system safe from them. The choice of using some agile software development techniques in their project has been justified through the paper. The next paper will focus on their pilot project on classical cipher cryptanalysis.

[16]. Manjunath, K., Jagadeesh, J., and Yogeesh, M., (2013). “Achieving quality product in a long term software product development in healthcare application using Lean and Agile principles”. International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), Kottayam, IEEE publication, pp. 26-34.

[18]. Mokhov, S., (2010). “Cryptolysis: A framework for verification of optimization heuristics for the automated cryptanalysis of classical ciphers and natural language word segmentation”. IEEE 8^th ACIS International Conference on Software Engineering Research, Management and Applications.

Architectural Design of General Cryptanalysis Platform for Pedagogical Purposes