GOST 28147 89 gamming mode. Easy replacement mode

All tips

). At the same time, in the Russian media and blogs of Russian users, the number of notes about this algorithm is growing: both covering the results of attacks on the Russian standard with varying degrees of reliability, and containing opinions about its operational characteristics. The authors (and, consequently, readers) of these notes often get the impression that the domestic encryption algorithm is outdated, slow and has vulnerabilities that make it significantly more susceptible to attacks than foreign encryption algorithms with a similar key length. With this series of notes we would like to tell you in an accessible form about the current state of affairs with the Russian standard. The first part will cover all attacks on GOST 28147-89 known to the international cryptographic community and current assessments of its strength. In future publications, we will also take a closer look at the properties of the standard from the point of view of the ability to build effective implementations.

Nicolas Courtois - "the great and terrible"

Let's start with a story about the activities of Nicolas Courtois, who is the author of a whole series of works devoted to the Russian block encryption standard ().

In October 2010, the process of considering the inclusion of the GOST 28147-89 algorithm in the international standard ISO/IEC 18033-3 began. Already in May 2011, an article by the famous cryptographer Nicolas Courtois appeared on the ePrint electronic archive, marked by a very ambiguous attitude towards him from the world cryptographic community. Courtois's publications are a sad example of the manipulation of concepts, which does not reveal any new properties of the object in question, but with a claim to sensation provokes the spread of erroneous opinions about its actual properties in an incompetent environment.

Algebraic method

Courtois's reasoning is built around two classes of cryptanalysis methods: algebraic methods and differential ones. Let's look at the first class of methods.

In a simplified way, the method of algebraic cryptanalysis can be described as the compilation and solution of a large system of equations, each of the solutions of which corresponds to the goal of the cryptanalyst (for example, if a system is compiled using one pair of plaintext and ciphertext, then all solutions of this system correspond to the keys at which this plaintext is converted into this one is encrypted). That is, in the case of the problem of cryptanalysis of a block cipher, the essence of the algebraic method of cryptanalysis is that the key is found as a result of solving a system of polynomial equations. The main difficulty is to be able to create as simple a system as possible, taking into account the characteristics of a particular cipher, so that the process of solving it takes as little time as possible. Here the key role is played by the features of each specific cipher being analyzed.

The algebraic method used by Courtois can be briefly described as follows. At the first stage, such properties of GOST 28147-89 are used as the existence of a fixed point for part of the encryption transformation, as well as the so-called reflection point. Thanks to these properties, several pairs are selected from a sufficiently large number of plaintext-ciphertext pairs, which make it possible to consider transformations not at 32, but only at 8 rounds. The second stage is that, based on the results of 8 round transformations obtained at the first stage, a system of nonlinear equations is constructed, in which the key bits are unknowns. Next, this system is solved (this sounds simple, but in reality is the most time-consuming part of the method, since the system consists of non-linear equations).

As noted above, nowhere in the work is there a detailed description and analysis of the complexity of the second and main stage of determining the key. It is the complexity of the second stage that determines the complexity of the entire method as a whole. Instead, the author provides the notorious “facts” on the basis of which he makes estimates of labor intensity. These "facts" are said to be based on experimental results. An analysis of the “facts” from Courtois’s work as a whole is given in the work of domestic authors. The authors of this work note that many of Courtois’ “facts” presented without any evidence were found to be false during experimental testing. The authors of the article went further and, following Courtois, carried out an analysis of the complexity of the second stage using well-founded algorithms and estimates. The resulting complexity estimates show the complete inapplicability of the presented attack. In addition to domestic authors, the big problems that Courtois has with assessments and justification of his methods were also noted, for example, in the work.

Differential method

Let's consider the second Courtois method, which is based on differential cryptanalysis.

The general method of differential cryptanalysis is based on the exploitation of the properties of nonlinear mappings used in cryptographic primitives, associated with the influence of the key value on the dependencies between the differences between pairs of input and pairs of output values of these mappings. Let us describe the main idea of the differential method of cryptographic analysis of a block cipher. Typically, block ciphers transform the input data in stages using a number of so-called round transformations, and each round transformation does not use the entire key, but only some part of it. Let's consider a slightly “truncated” cipher, which differs from the original one in that it does not have the last round. Let us assume that it has been possible to establish that encrypting two plaintexts that differ in some fixed positions using such a “truncated” cipher will most likely result in ciphertexts that also differ in some fixed positions. This property shows that a “truncated” cipher most likely leaves a dependency between some plaintexts and the results of their encryption. In order to recover part of the key using this obvious flaw, it is necessary to be able to encrypt pre-selected plaintexts on the key we want to recover (the so-called “chosen plaintext attack”). At the beginning of the “key opening” procedure, a number of pairs of plaintexts are randomly generated, differing in those same fixed positions. All texts are encrypted using a “full” cipher. The resulting ciphertext pairs are used to recover those key bits that were used in the last round transformation, as follows. Using some randomly selected value of the desired key bits, a transformation inverse to the last round transformation is applied to all ciphertexts. In fact, if we guessed the desired value of the key bits, we will get the result of a “truncated” cipher, and if we didn’t guess, we will actually “encrypt the data even more,” which will only reduce the dependence between blocks noted above (the difference is in some fixed positions). In other words, if among the results of such “additional processing” of ciphertexts there were quite a lot of pairs that differ in the fixed positions known to us, then this means that we have guessed the required key bits. Otherwise, there will be significantly fewer such pairs. Since only part of the key is used in each round, the searched bits (that is, the key bits used in the last round) are not as numerous as the bits in the full key and can simply be iterated over by repeating the steps above. In this case, we will definitely stumble upon the correct meaning someday.

From the above description it follows that the most important thing in the differential analysis method is the numbers of those very positions in plaintexts and ciphertexts, the differences in which play a key role in recovering the key bits. The fundamental presence of these positions, as well as the set of their numbers, directly depends on the properties of those nonlinear transformations that are used in any block cipher (usually all the “nonlinearity” is concentrated in the so-called S-boxes or replacement nodes).

Courtois uses a slightly modified version of the differential method. Let us immediately note that Courtois conducts his analysis for S-boxes that are different from the current ones and from those proposed in ISO. The work provides differential characteristics (those numbers in which the blocks should differ) for a small number of rounds. The justification for extending the characteristics for more rounds is, as usual, based on “facts”. Courtois expresses, again, with nothing other than his authority, an unsupported assumption that changing the S-boxes will not affect the resistance of GOST 28147-89 against his attack (while for unknown reasons, the S-boxes from the 1st working draft of the addition to the standard ISO/IEC 18033-3 was not considered). The analysis carried out by the authors of the article shows that even if we take Courtois’s unfounded “facts” on faith and analyze GOST 28147-89 with other S-blocks, the attack again turns out to be no better than a complete search.

A detailed analysis of Courtois’s works with a detailed substantiation of the groundlessness of all statements about the decrease in the resistance of the Russian standard was carried out in the works [,].

At the same time, even Courtois himself admits the absolute lack of accuracy in the calculations! The following slide is taken from Courtois' presentation at the FSE 2012 short notice section.

It should be noted that Courtois’s work has also been repeatedly criticized by foreign researchers. For example, his work on constructing attacks on the AES block encryption algorithm using the XSL method contained the same fundamental flaws as the work on the analysis of the Russian standard: most of the labor intensity estimates appear in the text completely unfounded and unsubstantiated - detailed criticism can be found, for example, in work. In addition, Courtois himself admits to widespread refusals to publish his work at major cryptography conferences and in established peer-reviewed journals, often leaving him only the opportunity to speak in the short announcement section. For example, you can read about this in section 3 of the work. Here are some quotes from Courtois himself relating to his work:

“I think that the audiences of Asiacrypt will not feel it is interesting.” Asiacrypt 2011 reviewer.
“… there is a big, big, big problem: this attack, which is the main contribution of the paper has already been published at FSE’11 (it was even the best paper), …”. Reviewer for Crypto 2011.

Thus, the professional part of the international cryptographic community regards the quality of Courtois’s work with no less doubt than, say, the statements of some Russian specialists about their ability to crack AES for 2,100, which are not confirmed by any consistent calculations, or the latest “evidence” of a two-page hypothesis on the inequality of complexity classes P and NP.

Attacks by Isobe and Dinur-Dankelman-Shamir

The general idea of the Isobe () and Dinur-Dankelman-Shamir attacks (hereinafter: DDS attack) () is to construct for a certain (key-dependent) narrow set of plaintexts an equivalent transformation on this set, which has a structure simpler than the encryption transformation itself . In the case of the Isobe method, this is the set of 64-bit blocks x such that F 8 -1 (Swap(F 8 (z))) = z, where z = F 16 (x), through F 8 (x) and F 16 ( x) indicate the first 8 and first 16 rounds of GOST 28147-89 encryption, respectively, through Swap - the operation of swapping halves of a 64-byte word. If the plaintext is included in this set, the result of the full 32-round transformation of GOST 28147-89 coincides with the result of the 16-round one, which is what the author of the attack exploits. In the case of the DDS method, this is the set of x such that F 8 (x) = x (fixed point of the transformation F 8). For any plaintext from this set, the GOST 28147-89 transformation works exactly the same as its last 8 rounds, which simplifies the analysis.

The complexity of the Isobe attack is 2,224 encryption operations, the DDS attack is 2,192. However, all questions about whether the Isobe and DDS attacks introduce new restrictions on the conditions for using our algorithm are removed by assessing the requirements for the volume of material required to carry out each of the attacks: the Isobe method requires 2 32 pairs of plaintext and ciphertext, and for the DDS method - 2 64. Processing such volumes of material without changing the key is a priori unacceptable for any block cipher with a block length of 64: on material of volume 2 32 , taking into account the problem of birthdays (see, for example, ), the probability of occurrence of repeated blocks is close to 1/2, which will provide the attacker is able to draw certain conclusions about the plaintexts from the ciphertexts without determining the key. The presence of 2 64 pairs of plain and encrypted texts obtained on one key actually allows the enemy to carry out encryption and decryption operations without knowing this key at all. This is due to a purely combinatorial property: the enemy in this case has the entire encryption conversion table. This situation is absolutely unacceptable under any reasonable operating conditions. For example, in CryptoPro CSP there is a technical limitation on the volume of encrypted material (without key conversion) of 4 MB (see). Thus, a strict prohibition on using a key on material of such volume is inherent in any block cipher with a block length of 64 bits, and therefore, Isobe and DDS attacks in no way narrow the scope of use of the GOST 28147-89 algorithm while maintaining the maximum possible strength of 2,256.

Of course, it should be noted that researchers (Isobe and Dinur-Dankelman-Shamir) have shown that some properties of the GOST 28147-89 algorithm make it possible to find analysis paths that were not taken into account by the creators of the algorithm. The simple form of the key schedule, which significantly simplifies the task of constructing effective implementations, also allows for some rare cases of keys and plaintexts to construct simpler descriptions of the transformations produced by the algorithm.

The work demonstrates that this negative property of the algorithm can be easily eliminated with full preservation of operational characteristics, but, unfortunately, it is an integral part of the algorithm in its commonly used form.

Note that certain negligence in the estimates of average labor intensity is also present in the work of Dinur, Dunkelman and Shamir. Thus, when constructing an attack, due attention is not paid to the following point: for a significant proportion of keys, the set of plaintexts x, such that F 8 (x) = x, is empty: there may simply be no fixed points for 8 rounds of transformation. The existence of fixed points also depends on the choice of replacement nodes. Thus, the attack is only applicable for certain replacement nodes and keys.

It is also worth mentioning one more work with an attack on GOST 28147-89. In February 2012, an updated version of the article (dated November 2011) appeared on the ePrint electronic archive of the international cryptographic association, which contained a new attack on GOST 28147-89. The characteristics of the presented attack are as follows: the volume of material is 2 32 (like Isobe), and the labor intensity is 2 192 (like DDS). Thus, this attack improved the time-record DDS attack in terms of material volume from 2 64 to 2 32. We note separately that the authors honestly presented all the calculations with justification for the complexity and volume of the material. After 9 months, a fundamental error was found in the above calculations, and since November 2012, the updated version of the article in the electronic archive no longer contains any results regarding the domestic algorithm.

Attacks assuming the attacker knows "something" about the keys

Finally, we note that in the literature there is also a number of works (see, for example, and ) devoted to attacks on GOST 28147-89 in the so-called linked key model. This model basically contains the assumption that an attacker can gain access for analysis not just to pairs of open texts and encrypted using the desired key, but also to pairs of open and encrypted texts obtained using (also unknown) keys that differ from the desired one by known regular way (for example, in fixed bit positions). In this model, it is indeed possible to obtain interesting results about GOST 28147-89, but in this model it is possible to obtain no less strong results about, for example, the AES standard, which is most widely used in modern public networks (see, for example,). Note that the conditions for carrying out this type of attack arise when using a cipher in a certain protocol. It should be noted that results of this kind, although they are of undoubted academic interest from the point of view of studying the properties of cryptographic transformations, are actually not relevant to practice. For example, all cryptographic information protection tools certified by the FSB of Russia meet the strictest requirements for encryption key generation schemes (see, for example,). As indicated in the results of the analysis, if there are 18 associated keys and 2 10 pairs of plaintext and ciphertext blocks, the complexity of completely opening the private key, with a probability of success of 1-10 -4, is actually 2 26. However, if the above-mentioned requirements for the development of key material are met, the probability of finding such keys is 2 -4352, that is, 24096 times less than if you simply try to guess the secret key on the first try.

Works related to the model with linked keys also include work, which in 2010 caused a lot of noise in Russian electronic publications, which do not suffer from the habit of carefully checking the material in the race for sensations. The results presented in it were not supported by any rigorous justification, but contained loud statements about the ability to hack the state standard of the Russian Federation on a weak laptop in a matter of seconds - in general, the article was written in the best traditions of Nicolas Courtois. But, despite the absolutely obvious groundlessness of the article to the reader who is more or less familiar with the basic principles of scientific publications, it was precisely to reassure the Russian public after the work that Rudsky wrote a detailed and thorough text containing a comprehensive analysis of this shortcoming. The article with the self-explanatory title “On the zero practical significance of the work “Key recovery attack on full GOST block cipher with zero time and memory”” provides justification that the average complexity of the method given in is no less than the complexity of a complete search.

Dry residue: what is durability in practice?

In conclusion, we present a table containing data on all the results of strictly described and justified attacks on GOST 28147-89 known to the international cryptographic community. Note that the complexity is given in the encryption operations of the GOST 28147-89 algorithm, and the memory and material are indicated in algorithm blocks (64 bits = 8 bytes).

Attack	Labor intensity	Memory	Required material
Isobe	2 224	2 64	2 32
Dinur-Dankelman-Shamir, FP, 2DMitM	2 192	2 36	2 64
Dinur-Dankelman-Shamir, FP, low-memory	2 204	2 19	2 64
	2 224	2 36	2 32
Dinur-Dankelman-Shamir, Reflection, 2DMitM	2 236	2 19	2 32
Complete search	2 256	1	4
Number of nanoseconds since the creation of the Universe	2 89

Despite a fairly large-scale cycle of research in the field of resistance of the GOST 28147-89 algorithm, at the moment there is not a single attack known, the conditions for the implementation of which would be achievable with the accompanying operational requirements for a block length of 64 bits. The restrictions on the volume of material that can be processed on one key resulting from the cipher parameters (key bit length, block bit length) are significantly stricter than the minimum volume required to carry out any currently known attack. Consequently, when meeting existing operational requirements, none of the cryptanalysis methods proposed to date GOST 28147-89 allows determining a key with a labor intensity less than exhaustive search.

Algorithm GOST 28147-89 and code “Magma” (GOST R 34.12-2015)

General scheme of the algorithm. The algorithm described by GOST 28147-89 “Information processing systems. Cryptographic protection. Cryptographic Transformation Algorithm” is a domestic symmetric encryption standard (until January 1, 2016) and is required for implementation in certified cryptographic information protection tools used in government information systems and, in some cases, in commercial systems. Certification of cryptographic information security means is required to protect information that constitutes a state secret of the Russian Federation, and information whose confidentiality must be ensured in accordance with current legislation. Also in the Russian Federation, the use of the GOST 28147-89 algorithm is recommended for protecting banking information systems.

The GOST 28147-89 algorithm (Fig. 2.21) is based on the Feistel scheme and encrypts information in blocks of 64 bits, which are divided into two subblocks of 32 bits (I, and R). Subblock R, is processed by the round transformation function, after which its value is added to the value of the subblock Lj, then the subblocks are swapped. The algorithm has 16 or 32 rounds depending on the encryption mode (simulation calculation or other encryption modes).

Rice. 2.21.

In each round of the algorithm, the following transformations are performed.

1. Key application. Subblock content R i folds modulo 2 32 with the round key K.Kj is the 32-bit part of the original key used as a round key. The GOST 28147-89 ns algorithm uses a key expansion procedure; the original 256-bit encryption key is represented as a concatenation (chaining) of eight 32-bit subkeys (Fig. 2.22): K 0, K (, K t K, K A, K 5, K 6, K 7.

The encryption process uses one of these subkeys TO

From the 1st to the 24th round - in direct sequence:

From the 25th to the 32nd round - in reverse order:

Rice. 2.22. Structure of the encryption key of the GOST 28147-89 algorithm

2. Table replacement. After applying the key, the subblock R i is divided into eight parts but 4 bits, the value of each of which is individually replaced in accordance with its replacement table (S-box). A total of eight S-blocks are used - S 0, S, S 2, S 3, S 4, S 5, S 6, S 7. Each S-box of the GOST 28147-89 algorithm is a vector (one-dimensional array) with ^elements numbered from 0 to 15. The values of the S-box are 4-bit numbers, i.e. integers from 0 to 15.

An element is taken from the S-box table whose serial number coincides with the value that came to the substitution input.

Example 2.6.

Let there be an S-block of the following form:

Let the value 0100 2 = 4 be applied to the input of this S-block. The output of the S-block will be the 4th element of the substitution table, i.e. 15 = 1111 2 (numbering of elements starts from zero).

Substitution persons are not defined by the standard, as is done, for example, in the DES cipher. Interchangeable values of replacement tables significantly complicate cryptanalysis of the algorithm. At the same time, the robustness of the algorithm significantly depends on their correct choice.

Unfortunately, the GOST 28147-89 algorithm has “weak” replacement tables, using which the algorithm can be quite easily cracked using cryptanalytic methods. Among the “weak” ones is, for example, a trivial substitution table in which the input is equal to the output (Table 2.16).

Table 2.16

Example of a weak S-box

It is believed that the specific values of the substitution tables should be kept secret and are a long-term key element, i.e. are valid for a much longer period than individual keys. However, the secret values of the replacement tables are not part of the key and cannot increase its effective length.

Indeed, secret replacement tables can be calculated using the following attack, which can be used in practice:

the zero key is set and a search for the “zero vector” is performed, i.e. values z = F( 0), where F- round transformation function of the algorithm. This requires about 2 32 encryption test operations;
using the zero vector, the values of the replacement tables are calculated, which takes no more than 2 11 operations.

However, even if the confidentiality of the replacement tables is violated, the strength of the cipher remains extremely high and does not fall below the permissible limit.

It is also assumed that the replacement tables are common to all encryption nodes within the same cryptographic security system.

Improving the structure of S-boxes is one of the most intensively researched problems in the field of symmetric block ciphers. Essentially, it requires that any changes to the S-box inputs result in random by type of change in output data. On the one hand, the larger the S-boxes, the more resistant the algorithm is to linear and differential cryptanalysis methods. On the other hand, a large substitution table is more difficult to design.

In modern algorithms, S-boxes are usually a vector (one-dimensional array) containing 2" t- bit elements. The input of the block determines the number of the element, the value of which serves as the output of the S-block.

A number of criteria have been put forward for the design of S-boxes. The replacement table must satisfy:

strict avalanche criteria;
bit independence criterion;
the requirement of nonlinearity from the input values.

To fulfill the last requirement, it was proposed to specify a linear combination i bits ( i = 1, ..., T) substitution table values bent functions(English, bent - deviating, in this case, from linear functions). Bent functions form a special class of Boolean functions characterized by the highest class of nonlinearity and compliance with the strict avalanche criterion.

Some works for S-boxes propose checking the implementation of a guaranteed avalanche effect of order y - when one input bit changes, at least the output bits of the S-box change. The property of a guaranteed avalanche effect of order y from 2 to 5 provides fairly good diffusion characteristics of S-boxes for any encryption algorithm.

When designing sufficiently large replacement tables, the following approaches can be used:

random selection (for small S-boxes may result in weak substitution tables);
random selection followed by testing for compliance with various criteria and rejection of weak S-boxes;
manual selection (too labor-intensive for large S-blocks);
mathematical approach, for example generation using bent functions (this approach is used in the CAST algorithm).

We can propose the following procedure for designing individual S-blocks of the GOST 28147-89 algorithm:

each S-box can be described by four logical functions, each of the functions must have four logical arguments;
it is necessary that these functions be sufficiently complex. This complexity requirement cannot be expressed formally, but as a necessary condition it can be required that the corresponding logical functions, written in minimal form (i.e., with the minimum possible expression length) using basic logical operations, must not be shorter than some required value;
individual functions, even those used in different replacement tables, must differ from each other to a sufficient extent.

In 2011, a new attack “reflexive meeting in the middle” was proposed, which slightly reduces the resistance of GOST 28147-89 (from 2256 to 2225). The best cryptanalysis result of the algorithm as of 2012 allows its strength to be reduced to 2,192, requiring a relatively large ciphertext size and volume of preformed data. Despite the proposed attacks, at the current level of development of computer technology, GOST 28147-89 remains practical.

Code "Magma" (GOST R 34.12-2015). The GOST 28147-89 standard has been in force in Russia for more than 25 years. During this time, it has shown sufficient durability and good efficiency of software and hardware implementations, including on low-resource devices. Although cryptanalytic attacks have been proposed that reduce its strength ratings (the best is to 2,192), they are far from being practical. Therefore, it was decided to include the GOST 28147-89 algorithm in the newly developed symmetric encryption standard.

In 2015, the shop adopted two new national cryptographic standards: GOST R 34.12-2015 “Information technology. Cryptographic information protection. Block ciphers" and GOST R 34.13-2015 "Information technology. Cryptographic information protection. Operating modes of block ciphers”, which come into effect on January 1, 2016.

The GOST R 34.12-2015 standard contains a description of two block ciphers with a block length of 128 and 64 bits. The GOST 28147-89 cipher with fixed nonlinear substitution blocks is included in the new GOST R 34.12-2015 as a 64-bit cipher called “Magma”.

Below are the replacement blocks specified in the standard:

The set of S-boxes given in the standard provides the best characteristics that determine the resistance of the cryptographic algorithm to differential and linear cryptanalysis.

According to the technical committee for standardization “Cryptographic information protection” (TK 26), fixing nonlinear substitution blocks will make the GOST 28147-89 algorithm more unified and help eliminate the use of “weak” nonlinear substitution blocks. In addition, the fixation of all long-term cipher parameters in the standard is consistent with accepted international practice. The new standard GOST R 34.12-2015 is terminologically and conceptually related to the international standards ISO/IEC 10116 “Information technologies. Security methods. Modes of operation for “-bit block ciphers” (ISO/IEC 10116:2006 Information technology - Security techniques - Modes of operation for an n-bit block cipher) and the ISO/IEC 18033 series “Information technologies. Methods and means of ensuring security. Encryption algorithms: ISO/IEC 18033-1:2005 “Part 1. General” (ISO/IEC 18033-1:2005 Information technology - Security techniques - Encryption algorithms - Part 1: General) and ISO/IEC 18033-3: 2010 “Part 3. Block ciphers” (ISO/IEC 18033-3:2010 (Information technology - Security techniques - Encryption algorithms - Part 3: Block ciphers)).

The GOST P 34.12-2015 standard also includes a new block cipher (“Grasshopper”) with a block size of 128 bits. This cipher is expected to be resistant to all currently known attacks on block ciphers.

The operating modes of block ciphers (simple replacement, gamma, gamma with output feedback, gamma with ciphertext feedback, simple replacement with chaining and generation of imitative insertion) are included in a separate standard GOST R 34.13-2015, which corresponds to the accepted international practice. These modes apply to both the Magma cipher and the new Grasshopper cipher.

A bitwise cyclic shift is performed to the left by 11 bits. Decryption is carried out according to the same scheme, but with a different schedule for using keys: from the 1st to the 8th round of decryption - in direct order: from the 9th to 32nd round of decryption - in the reverse order: Compared to the DES cipher in GOST 28147-89 has the following advantages: a significantly longer key (256 bits versus 56 for the DES cipher), an attack on which by exhaustive search of the key set seems impossible at the moment; a simple schedule for using the key, which simplifies the implementation of the algorithm and increases the speed of calculations. Design of S-blocks GOST 28147-89. It is obvious that the scheme of the GOST 28147-89 algorithm is very simple. This means that the greatest encryption load falls on the replacement tables. Table values
Panasepko S.P. Encryption algorithms: a special reference book. St. Petersburg: BHV-Petersburg, 2009.
Kara O. Reflection Attacks on Product Ciphers. URL: http://eprint.iacr.org/2007/043.pdf
Russian encryption standard: reduced strength. URL: http://cryptofaq.ru/index.php/2010-12-23-18-20-21/2010-12-23-18-22-09/90-2011-02-01-07-47- 27
Achekseev E.K., Smyshlyaev S.V. GOST 28147-89: “Don’t rush to bury him.”

1 Block diagram of the cryptographic transformation algorithm 1

2 Easy replacement mode 4

3 Gamma mode 8

4 Gamma mode with feedback 11

5 Simulation insert generation mode 14

Appendix 1 Terms used in this standard and their definitions 16

Appendix 2 Values of constants C1, C2 18

Appendix 3 Schemes for software implementation of the cryptographic algorithm

transformations. 19

Appendix 4 Rules for summation modulo 2 32 and modulo (2 32 -I) 25

STATE STANDARD

USSR UNION

INFORMATION PROCESSING SYSTEMS. CRYPTOGRAPHIC SECURITY

Cryptographic conversion algorithm

Date of introduction 07/01/90

This standard establishes a unified cryptographic transformation algorithm for information processing systems in networks of electronic computers (computers), individual computing systems and computers, which defines the rules for data encryption and the development of imitations.

The cryptographic transformation algorithm is designed for hardware or software implementation, satisfies cryptographic requirements and, in its capabilities, does not impose restrictions on the degree of secrecy of the protected information.

The standard is mandatory for organizations, enterprises and institutions that use cryptographic protection of data stored and transmitted in computer networks, in separate computer systems or in computers.

The terms used in this standard and their definitions are given in Appendix 1.

I. BLOCK DIAGRAM OF THE CRYPTOGRAPHIC TRANSFORMATION ALGORITHM

1.1. The block diagram of the cryptographic transformation algorithm (crypto-scheme) contains (see Figure 1):

Official publication ★

a 256-bit key storage device (KSD), consisting of eight 32-bit drives (X0, Xt. X2, A3 L4, X$, X6, Xu); four 32-bit drives (/V (, N 2, Nj, /V 4);

Reproduction is prohibited

two 32-bit drives L/$,) with permanent fillings C 2, C\\ recorded in them

two 32-bit modulo 2 32 adders (SM|, SL/3);

32-bit adder of bitwise summation modulo 2 (SL/2);

32-bit modulo adder (2 32 - 1) (SL/ 4);

modulo 2(SL/5) adder, there is no restriction on the capacity of the SL/$ adder;

substitution block (A);

cyclic shift register eleven steps towards the most significant digit (R).

1.2. The substitution block A" consists of eight substitution nodes A'j,

A 2, A“z, K 4, A5, A7, A 8 with 64-bit memory each. Post

A 32-bit vector applied to a substitution block is split into eight sequential 4-bit vectors, each of which is converted into a 4-bit vector by the corresponding substitution node, which is a table of sixteen rows containing four padding bits per row. The input vector determines the address of a row in the table, the filling of this row is the output vector. The 4-bit output vectors are then concatenated sequentially into a 32-bit vector.

1.3. When adding and cyclically shifting binary vectors, the most significant bits are considered to be the bits of the storage devices with large numbers.

1.4. When writing the key (I", W 2 ..., W q e (0,1), d = N256, in

The KZU value W\ is entered into the i-th digit of the drive Xq, the value W 2 is entered into the 2nd digit of the drive L#, ..., the value W^ 2 is entered into the 32nd digit of the drive Xq; the value W33 is entered into the 1st digit of the drive X\ y, the value is entered into the 2nd digit of the drive X\ y..., the value W M is entered into the 32nd digit of the drive X\\ the value W 6 5 is entered into the 1st digit of the drive X 2 etc., the value 1U 2 5b is entered into the 32nd bit of the Xy drive.

1.5. When rewriting information, the contents of the p-th digit of one drive (adder) are rewritten into the p-th digit of another drive (adder).

1.6. The values of constant fillings Cj, C 2 (constants) of storage devices /V 6, /V5 are given in Appendix 2.

1.7. The keys that determine the filling of the KZU and tables of the substitution block K are secret elements and are supplied in the prescribed manner.

Filling the tables of the substitution block K is a long-term key element common to the computer network.

The organization of various types of communication is achieved by building an appropriate key system. In this case, the possibility of generating keys (filling the KZU) in a simple replacement mode and encrypting them in a simple replacement mode while providing imitational protection for transmission over communication channels or storage in computer memory can be used.

1.8. The crypto scheme provides for four types of work: encryption (decryption) of data in simple replacement mode; encryption (decryption) of data in gamma mode;

encryption (decryption) of data in gamma mode with feedback;

simulation insert generation mode.

Schemes for software implementation of the cryptographic transformation algorithm are given in Appendix 3.

2. EASY CHANGE MODE

2.1. Encrypting clear data using simple replacement mode

2.1.1. The cryptoscheme that implements the encryption algorithm in simple replacement mode should have the form shown in Figure 2.

Open data to be encrypted is divided into blocks of 64 bits each. Input of any block T () = (D|(0), ^(O), ..., d 3 1(0), i 32 (0), £|(0), b 2 (0) y.. . , Z> 32 (0)) of binary information into drives N\ and N 2 is carried out so that the value D|(0) is entered into the 1st bit of N|, the value a 2 (0) is entered into the 2nd bit / Vj, etc., the value i 32 (0) is entered into the 32nd digit iVj; value />|(0) is entered into

1st digit L/ 2, the value b 2 (0) is entered into the 2nd digit N 2, etc., the value >> 32 (0) is entered into the 32nd digit N 2. As a result, the state is obtained (i 32 (0), i 3 |(0), ..., and 2 (0) y<7|(0)) накопителя yVj и состояние (/>32 (0), b 2 1(0), ... , />|(0)) storage unit N 2.

2.1.2. 256 bits of the key are entered into the RCU. The contents of eight 32-bit drives Aq, X\ t ..., Xj have the form:

^0 = (^32^3.....

*1 =(^64^63, . ^34^33)

*7 = (^56> ^255. ... , I/ 226, ^ 225)

2.1.3. The encryption algorithm for a 64-bit block of plain data in simple replacement mode consists of 32 cycles.

In the first cycle, the initial filling of the storage is summed modulo 2 32 in the adder CM\ with the filling of the storage Xq, while the filling of the storage Nj is maintained.

The result of the summation is converted in the substitution block K and the resulting vector is sent to the input of the register /?, where it is cyclically shifted by eleven steps towards the most significant digits. The result of the shift is summed bit by bit modulo 2 in the adder CM 2 with 32-bit filling of the drive yV 2 . The result obtained in CM 2 is written in N\%, with the old filling N| rewritten to N 2. The first cycle ends.

Subsequent cycles are carried out similarly, with

In the 2nd cycle, the filling X\ is read from the RCU, in the 3rd cycle from the RCU

the filling X2 is read, etc., in the 8th cycle the filling Xj is read from the RCU. In cycles from 9 to 16, as well as in cycles from 17 to 24, fillings from the RCU are read in the same order:

In the last eight cycles from the 25th to the 32nd, the order of reading the RCD fills is reversed:

hell, hell, hell, hell.

Thus, when encrypting in 32 cycles, the following order of selecting storage fills is carried out:

hell, ^2,^),^4>^5,^6"^7, hell, ^2,^3"^4,^5,-^6,^7, hell, hell, hell, hell, hell ,hell,hell,hell.

In the 32nd cycle, the result from the adder SL/ 2 is entered into the drive CU 2, and the old filling is stored in the drive N\.

Received after the 32nd cycle of encryption of filling the N| and N2 are the encrypted data block corresponding to the plain data block.

2.1 4 Encryption equations in simple replacement mode have the form:

J*Cr> "(

I b(/) = a(/~ I)

at y = I -24;

\bO) - a O - O at / 8* 25 -g 31; a(32) = a(31)

A (32) = (d (31) ffl X 0)KRG> b (31)

where d(0) = (a 32 (0), «з|(0), ... , Д|(0)) is the initial filling of N\ before the first encryption cycle;

6(0) = (632(0), 63j(0), ... , 6j(0)) - initial filling /U 2 before the first encryption cycle;

a(j) = (032(7), 0з|(/) e... , 0|(/)) - filling of the control unit, after the y-th encryption cycle;

b(j) = (6з 2 (/), 63j(/"), ... , 6|(/)) - filling /V 2 after the yth encryption cycle, y = 032.

The sign φ means the bitwise summation of 32-bit vectors modulo 2.

The sign Ш means the summation of 32-bit vectors modulo 2 32. The rules for summation modulo 2 32 are given in Appendix 4;

/? - the operation of a cyclic shift of eleven steps towards the highest digits, i.e.

^(g 32"O|>g 30>g 29>g 28>g 27>g 26"g 25>g 24>g 23'G 22"G 2b G 20>"g 2* g |)~

= (g 21" g 20> - "g 2* g 1 * G 32>G31 *GzO" g 29* g 28*, 27e"26e/"25>, 24>G23", 22)*

2.1.5. The 64-bit block of encrypted data Tsh is output from drives L^, VU 2 in the following order: from the 1st, 2nd, ..., 32nd bits of drive L7|, then from the 1st, 2nd , ... , 32nd bit of storage W 2, i.e.

t w - (a,<32),0 2 (32),0 32 (32), 6,(32), 6 2 <32),6 32 <32».

The remaining blocks of open data in simple replacement mode are encrypted in the same way.

2.2. Decrypting encrypted data using easy replacement mode

2.2.1. The cryptoscheme that implements the decryption algorithm in simple replacement mode has the same form (see Chsrt.2) as for encryption. 256 bits of the same key used for encryption are entered into the RCU. The encrypted data to be decrypted is divided into blocks of 64 bits each. Enter any block

T w - (0,(32),o 2 (32), ..., 0 32 (32), 6,(32), 6 2 (32), ..., 6 32 (32))

into accumulators L', and N 2 are produced so that the value dj(32) is entered into the 1st digit /V, the value o 2 (32) is entered into the 2nd digit /V, etc., the value a 32 (32) is entered into the 32nd digit /V,; the value 6,(32) is entered into the 1st digit of N2, etc., the value 6 32 (32) is entered into the 32nd digit of N2.

2.2.2. Decryption is carried out according to the same algorithm as the encryption of open data, with the difference that the fillings of the drives Xq, X\y..., Xj are read from the RCU in decryption cycles in the following order:

hell, hell 3, hell, hell, hell, hell, hell, hell 0,

hell 6, hell 4, hell 2, hell, hell, hell, hell 2, hell.

2.2.3. The decryption equations look like:

G d (32 - /) = (d (32 - / + 1) SHLG,.,) *LF6(32 - / + 1) b (32 - /) = d (32 - / + 1) at, / = 1+8;

I o(32- /) = (a(32-/M)SHDG (32 _ /)(tod8))KLFL(32./M) |6(32-/) = d (32 - / + 1)

at /= 9 + 31;

b(0) = (a (1) ШДГо) ОФй(1)

2.2.4. The W and N 2 drives obtained after 32 cycles of operation constitute a block of open data.

Then = (fli(O), a 2 (0), ... , Az 2 (0)" 6, (0), 6 2 (0), ... , 6 32 (0)), corresponding to the block of encrypted data, while the value o,(0) of block 7о corresponds to the contents of the 1st bit yV, the value 02(0) corresponds to

P. 8 GOST 28147-89

corresponds to the contents of the 2nd bit N\, etc., the value Dz2(0) corresponds to the contents of the 32nd bit N\; the value 6j(0) corresponds to the contents of the 1st bit; the value ^(0) corresponds to the contents of the 2nd bit N2, etc., the value £зг(0) corresponds to the contents of the 32nd bit N2-

The remaining blocks of encrypted data are decrypted in the same way.

2.3. The encryption algorithm in the mode of simple replacement of the 64-bit block G 0 is denoted by A y i.e.

A (T 0) = A (a (0), b (0)) = (a (32), b (32)) = T w.

2.4. The simple replacement mode can be used to encrypt (decrypt) data only in the cases given in clause 1.7.

3. GAMING MODE

3.1. Encrypting open data in gamma mode

3.1.1. The cryptoscheme that implements the encryption algorithm in gamma mode has the form shown in Figure 3.

Open data, divided into 64-bit blocks T\)\ 7), 2) ..., 7)) m “, 1 7[) M), is encrypted in gamma mode by bitwise summation modulo 2 in the adder SL/5 with the cipher gamma Гш, which is produced in blocks of 64 bits, i.e.

G _/L1) R2) Lm-1) LM)\

"ill V 1 w e * w * » " Sh » " * * * " 111 /»

where M is determined by the volume of encrypted data.

Tjj) - Yth 64-bit block, /“ the number of binary digits in block 7J) M) can be less than 64, while the part of the cipher gamut not used for encryption from block Г\^] is discarded.

3.1.2. 256 bits of the key are entered into the RCU. A 64-bit binary sequence (sync message) S = (5*1, S 2, ..., 5^4) is introduced into the drives iVj, N 2, which is the initial filling of these drives for the subsequent generation of M-blocks of the cipher gamma. The sync message is entered into jV| and L^so that the value 5[ is entered into the 1st digit of VU), the value S 2 is entered into the 2nd digit N\, etc., the value ^is entered into the 32nd digit 7V|; the value S33 is entered into the 1st digit N2, the value 4S34 is entered into the 2nd digit N2, etc., the value is entered into the 32nd digit N2.

3.1.3. The initial filling of drives /Vj and N 2 (sync message.5) is encrypted in simple replacement mode in accordance with

The popularly known term “processor performance” is an objective, calculated parameter that is measured in flops. However, most people measure it in gigahertz, naively believing that they are the same thing. Nobody knows the term “code performance”, and I’ll immediately explain why.

The reason is that I only recently came up with it and haven’t told anyone about it yet. However, code performance, like processor performance, has objective characteristics that can be measured. This article is specifically about the performance of code executed by the processor core.

How is code performance measured? Since I was the first to talk about this, by right of the discoverer I will measure it in RTT units;).

Now seriously. In modern processors, the main transformations are operations on 32-bit numbers; everything else is, by and large, exotic. Therefore, we will take into account the main thing - operations with 32-bit numbers. How many 32-bit operations do you think a modern processor core can perform simultaneously?

The student will answer - one, his teacher will think and say that there are four, the professional - that there are only twelve operations so far.

So, a program code that loads all processor execution units simultaneously throughout the entire execution time of the code will have a performance of 12 RTTs. Maximum! To be honest, I have never written such code before, but in this article I will try to make an effort.

I will prove that code with simultaneous execution of twelve 32-bit operations is possible

Program code that uses one actuator in the processor core will naturally have a performance of 1 RTT. Programs generated by high-level language compilers and virtual machine interpreters can boast of such code performance. There is no need to assume that the processor load indicator, which can be seen in the OS task manager, can serve as an objective criterion for the efficiency of the code. The processor core load can be 100%, but the program code will use one execution unit in it (performance 1 RTT). In this case, at 100% load, the processor core will operate at 1/12 of its maximum performance. In other words, when the Windows Task Manager shows the maximum processor load, its actual performance may vary from 1 to 12 RTT. If you see 100% load on any processor core in the performance window, it is wrong to assume that all actuators are working in this core, not at all!

The only criterion for indirectly assessing the operation of a processor core at maximum performance can be its power consumption and, as a consequence, the noise of the cooler. Now, if the cooler is noisy, then yes, the load has gone to maximum. However, it’s time to finish with general concepts and move on to harsh practice.

Traditional implementation of GOST 28147-89

I am not a professional in the field of information security, but I am still familiar with the topic of encryption. I was inspired to study symmetric stream encryption specifically by conversations with a professional cryptographer whom I deeply respect. And, having taken up this topic, I tried to do it well, and not just well, but also quickly, performing the maximum number of operations per unit of time. In other words, I was faced with the task of writing program code with the maximum RTT value.

Cryptographic transformation in accordance with GOST 28147-89 is used for stream encryption of information in communication channels and on disk drives.

Currently, software implementation of this GOST on the RON of the central processor is widely used. In known methods of implementing GOST, all secret information (encryption keys, replacement blocks) is located in RAM. This reduces the reliability of encryption, since having a RAM dump can completely reveal all the secret elements of the crypto-transformation. In addition, the method has performance limitations due to the location of the main crypto conversion objects in the OP and incomplete loading of the ALU executive units. Modern processors, implementing the crypto procedure using a well-known method, can provide encryption speeds of 40–60 megabytes per second. And if we really understand it to the end, the reason for the low performance and weak security of crypto conversion is the software implementation of the substitution block. For its description in GOST, see Fig. 1.

According to clause 1.2 of GOST, this block implements tetrad (four bits) permutations in a 32-bit word, but the x86/64 processor architecture and its instruction system are not capable of effectively manipulating tetrads.

For the software implementation of the substitution block, special tables in RAM are used, prepared at the stage of initialization of the cryptofunction. These tables combine the replacement nodes of adjacent tetrads into 8 × 8-bit byte tables, thus placing four 256-byte tables in RAM.

In more advanced implementations, these tables are 1024 bytes in size (256 words of four bytes). This was done in order to implement in the tables an additional cyclic shift by 11 positions of the 32-bit word obtained as a result of substitution (the next operation of the conversion algorithm according to GOST). An example of GOST implementation using this method is shown in Appendix 1 (on disk).

The information of the substitution block is a secret component of the cryptofunction (as formulated in GOST, see Fig. 2).

Placing these tables with the keys of the substitution block in the OP contradicts the requirements of GOST (clause 1.7), since secret information becomes available to third-party programs running on the computing installation. The FSB, which also certifies software implementations of encryption according to GOST, looks at this violation, to put it mildly, condescendingly. If, in order to place keys in the OP, the FSB still requires a “fig leaf” - masking the keys using the XOR operation, then nothing is required for replacement blocks in the OP; they are stored in clear form.

In short, the FSB allows such software implementations of cryptoprocedures to pass, despite the obvious decrease in the security of such a solution and a direct violation of its own requirements according to GOST (clause 1.7). And this despite the well-known methods of breaking ciphers by taking a memory dump...

We will return to the issue of storing keys and replacement blocks in internal registers of the processor a little later (there is a beautiful and fast solution), but for now we will only store encryption keys in MMX registers, this is more reliable.

But enough of the lyrics, what is important for the topic under consideration is that this program code has a performance of 1 RTT. Now let's write code with the performance of 2 RTTs.

Multithreaded implementation of GOST 28147-89

The only way to speed up crypto procedures in the known algorithm is to introduce multithreading. The point of this change in the implementation of the algorithm is to calculate several blocks of data in parallel.

Most programmers mean by parallel processing exclusively the work of several processor cores, synchronized through interrupts and semaphores in memory.

However, there is another option for parallel data processing on a single processor core. Let me explain this non-obvious idea.

Modern processors include at least two, and even three to six arithmetic-logical units. These ALUs (FPUs, address arithmetic units, and so on) can operate independently of each other; the only condition for their parallel operation is that the software objects they operate on are disjoint. In other words, in instructions that simultaneously execute an ALU, the memory addresses and register numbers must be different. Or, no writes should be performed to shared registers and memory addresses accessed by various processor execution units.

The workload of all ALUs is controlled by a special hardware unit inside the processor core - the scheduler, which scans the executable code forward, to a depth of 32–64 bytes. If the scheduler discovers instructions that can be run on the ALU without conflicts, then it runs them simultaneously on different execution devices. In this case, the counter of executed commands indicates the executing command (there are several of them in such a scheme), after which all commands have already been executed.

Most program sequences generated automatically (by compilers) cannot load all the ALUs and FPUs located in the processor core. In this case, the processor hardware is idle, which significantly reduces its resulting performance. Processor developers understand this and introduce modes to increase core frequency when the hardware is not fully used. This is also what hypertrading systems are designed for, and I will use this system to “press” the code to the maximum in the future.

Compilers, even the most optimized ones, and even more so virtual machine engines, cannot generate optimized code in terms of performance. Only a programmer with engineering knowledge can write such optimized code, and the tool for writing it is exclusively assembler.

A typical illustration of the possibility of executing several independent program threads on one processor core is the GOST implementation, executed in two threads on a single processor core. The idea of the code is simple: there are two blocks of data to encrypt/decrypt, but one processor core that will perform the conversion. It is possible to perform the conversion on these two data blocks sequentially, and this has been done to this day. In this case, the time required to complete the transformations doubles.

But you can do it differently: alternate commands related to the processing of different data blocks. These options are presented graphically in Fig. 3.

In the figure, the top example shows the usual order of processing two independent blocks of data. The first block is processed first, then the processor proceeds to process the second block. Naturally, the resulting time is equal to twice the time required to process one block, and the actuators of the processor core are not fully loaded.

The following is an example of interleaving commands from different processing threads. In this case, commands related to different data blocks are interleaved. The scheduler selects instructions that are independent of each other and transmits them for execution to ALU1 and ALU2. The grouping of commands of the first and second threads on these ALUs is carried out automatically, since the scheduler operating algorithm includes a grouping of commands linked to common data on the same executive device.

In order for such program code to work without ALU downtime, it is necessary that each program thread work with its own set of registers. The cache in this scheme becomes a bottleneck (it has only two data output ports), so we store the keys in MMX registers. Since in this case the replacement (and shift) nodes in memory are only read, they can be common to both program threads.

This, of course, is a very simplified explanation of the principle of parallel execution of program threads on a single core; in reality, everything is much more complicated. In practice, you need to take into account the pipeline architecture of actuators, restrictions on simultaneous access to the cache and the RON register block, the presence of address arithmetic nodes, switches and much more... So this is a topic for professionals, who can be counted on the fingers... of one hand.

The parallel encryption method is effectively implemented only for the 64-bit processor operating mode, since in this mode there is a sufficient number of RON (as many as 16 pieces!). An example of GOST implementation using this method is shown in Appendix 2 (on disk).

It is clear that this GOST implementation has a code performance of 2 RTT codes. Now let's see how this affects execution time.

The encryption cycle for one stream (Appendix 1) is 352 clock cycles, and during this time 8 bytes of data are calculated; for a two-threaded implementation of GOST (Appendix 2) 416 processor cycles are required, but 16 bytes are calculated. Thus, the resulting conversion speed increases from 80 to 144 megabytes for a 3.6 GHz processor.

An interesting picture emerges: the code contains exactly twice as many commands, and executes only 15% longer, but I think readers have already understood the reason for this phenomenon...

Theoretically, the code from the second example should be executed in the same number of clock cycles as the code from the first example, but the scheduler node is developed by Intel engineers, but they are also human, and we are all far from perfect. So it is possible to evaluate the effectiveness of their creation. This code will also run on an AMD processor, and you can compare their results.

If anyone doesn’t take my word for it, then for such non-believers, test programs with clock counters are included on the disk. The programs are in source code, of course in assembler, so there is an opportunity to check my words, and at the same time, see some of the tricks of professional coding.

Using SSE registers and AVX commands of modern processors to implement GOST 28147-89

Modern processors of the x86/64 architecture include a set of SSE registers of 16 bytes in size and specialized FPUs (at least two) to perform various operations on these registers. It is possible to implement GOST on this equipment, and in this case, replacement nodes can be placed not in the form of tables in RAM, but directly on dedicated SSE registers.

One SSE register can accommodate two tables of 16 rows at once. Thus, four SSE registers will completely accommodate all replacement tables. The only condition for such placement is the interleaving requirement, according to which tetrads of the same byte must be placed in different SSE registers. In addition, it is advisable to place the low and high tetrads of the input bytes, respectively, in the low and high tetrads of the bytes of the SSE registers.

These requirements are determined by optimization for the existing set of AVX commands. Thus, each byte of the SSE register will contain two tetrads corresponding to different bytes of the input register of the substitution block, while the position of the byte in the SSE register uniquely corresponds to the index in the substitution block substitution table.

A diagram of one of the possible placements of replacement nodes on SSE registers is shown in Fig. 4.

Placing secret information of replacement nodes on SSE registers increases the security of the crypto procedure, but complete isolation of this secret information is possible if the following conditions are met:

The processor core is switched to hypervisor host mode, and the interrupt block (APIC) is forcibly disabled in it. In this case, the processor core is completely isolated from the OS and applications running on the computing installation.
SSE registers are loaded and the computing core is isolated before the OS starts; it is optimal to perform these procedures from the trusted boot module (TLM).
Programs for cryptoprocedures in accordance with GOST are located in a non-modifiable memory area of the computing installation (either BIOS or in flash memory of the MDZ).

Compliance with these requirements will ensure complete isolation and immutability of the program code of cryptoprocedures and the secret information used in them.

For efficient sampling of tetrad SSE registers, the multi-input byte switches included in the FPU blocks are used. These switches allow transfers from any source byte to any destination byte, using indices located in a special SSE index register. Moreover, transfer is performed in parallel for all 16 bytes of the SSE receiver register.

Having substitution storage nodes on SSE registers and a multi-input switch in FPU blocks, it is possible to organize the following transformation in the substitution block (Fig. 5).

In this scheme, the input register in each tetrad specifies the address for the corresponding switch, which transmits information from the drives of the replacement nodes to the output register via the data bus. This scheme can be organized in three ways:

Create an appropriate chip design, but this is fantastic for us.
Reprogramming the microcode and creating your own processor command to implement this function on existing processors is no longer a fantasy, but, unfortunately, it is unrealistic in the current conditions.
Write a program using official AVX commands. The option may not be very effective, but it can be implemented “here and now.” So that's what we'll do next.

The operation of switches is controlled by a special three-address command AVX VPSHUFB. Its first operand is the receiver of information from the switches, the second is the source to which the switches' inputs are connected. The third operand is a control register for switches, each byte of which is associated with a corresponding switch; the value in it specifies the number of the direction from which the switch reads information. For a description of this command from the official Intel documentation, see Fig. 5. In Fig. Figure 6 shows a diagram of how this command works - only half of the SSE registers are shown, for the second half everything is similar.

The switch uses only the lowest four bits to determine the direction of commutation, the last bit in each byte is used to force the corresponding receiver byte to zero, but this switch function is not yet in demand in our case.

A program for sampling tetrads through FPU switches was written, but I didn’t even put it in the application - it’s too pathetic. Having a 128-bit register and only using 32 bits in it is unprofessional.

As they say, “Our finish line is the horizon,” so squeeze it out, squeeze it out... we’ll press it and put it in bags!

This is not a play on words, but a harsh FPU reality - SSE registers can be divided into equal parts and the same transformations can be performed on these parts with one command. In order for the processor to understand this, there is a magic letter “P” - a packet that is placed before the command mnemonics, and no less magical letters “Q”, “D”, “W”, “B”, which are placed at the end and declare What parts are the SSE registers divided into in this command?

We are interested in batch mode with the SSE register divided into four 32-bit blocks; accordingly, all commands will be prefixed with “P” and at the end with the symbol “D”. This makes it possible to process four 32-bit blocks in parallel with one processor command, that is, calculate four data blocks in parallel.

The program that implements this method is available in Appendix 3, with all the explanations there.

However, press so much! Modern processors have at least two FPUs, and two independent instruction threads can be used to fully load them. If you correctly alternate commands from independent threads, you can load both FPU units completely with work and get eight parallel processed data streams at once. Such a program was written, and it can be viewed in Appendix 4, but you need to watch it carefully - you can go crazy. This is what is called “the code is not for everyone...”.

Price issue

The use of SSE registers to store replacement nodes is understandable - it provides some guarantee of isolation of secret information, but the meaning of calculating the cryptofunction itself on the FPU is not obvious. Therefore, the execution time of standard procedures was measured using the direct replacement method in accordance with GOST for four and eight threads.

For four threads, an execution speed of 472 processor cycles was obtained. Thus, for a processor with a frequency of 3.6 GHz, one thread is calculated at a speed of 59 megabytes per second, and four threads, respectively, at a speed of 236 megabytes per second.

For eight threads, an execution speed of 580 processor cycles was obtained. Thus, for a 3.6 GHz processor, one thread counts at 49 megabytes per second, and eight threads at 392 megabytes per second.

As the reader can see, the code in example #3 has a performance of 4 RTT, and the code in example #4 has a performance of 8 RTT. In these examples on SSE registers, the patterns are the same as when using RON, only the scheduler has reduced its efficiency. It currently provides a 20% increase in duration while doubling the code length.

Moreover, these results were obtained using universal AVX commands available in both Intel and AMD processors. If you optimize for an AMD processor, the result will be much better. It sounds contrary to the trend, but nevertheless it is true, and here’s why: AMD processors have an additional set of instructions, the so-called XOP extension, and in this additional set of instructions there are those that significantly simplify the implementation of the GOST algorithm.

This refers to the commands for logical burst byte shift and burst cyclic shift of double words. In the examples given in Appendices 3 and 4, sequences of universal commands are used that implement the necessary transformation: in the first case, one “extra” command, and in the other case, four extra commands at once. So there are optimization reserves, and considerable ones.

When it comes to further optimization, it’s worth remembering the presence of 256-bit registers (YMM registers), using which you can theoretically double the speed of calculations. But for now this is only a prospect; at the moment, processors slow down very much when executing 256-bit instructions (FPUs have a path width of 128 bits). Experiments have shown that on modern processors, counting 16 threads on YMM registers does not give any benefit. But this is only for now; on new processor models, the performance of 256-bit instructions will undoubtedly be increased, and then the use of 16 parallel threads will become advisable and will lead to an even greater increase in the speed of the crypto procedure.

Theoretically, you can count on a speed of 600–700 megabytes per second if the processor has two FPUs with a working path width of 256 bits each. In this case, we can talk about writing code with an efficiency of 16 RTT, and this is not fantasy, but the near future.

Mixed mode

Again the question of the number of registers arises; there are not enough of them to promote such an algorithm. But the hypertrading mode will help us. The processor core has a second set of registers available in logical processor mode. Therefore, we will execute the same code on two logical processors at once. In this mode, of course, we will not have more actuators, but due to alternation we can get a full load of all actuators.

You can’t count on an increase of 50% here; the bottleneck is the cache memory where the technological masks are stored, but you can still get an increase of 100 additional megabytes. This option is not shown in the appendices (the macros are similar to those used in the 8 RTT code), but it is available in the program files. So if anyone does not believe in the possibility of encryption at a speed of 500 megabytes per second on a single processor core, let them run test files. There are also texts with comments so that no one thinks that I am lying.

This focus is only possible on Intel processors; AMD has only two FPU units per two processor modules (analogous to the hypertrading mode). But there are four more ALUs that would be a sin not to use.

You can put the Bulldozer processor modules into a mode similar to the hypertrading mode, but run the conversion to RON on different modules in one thread, and on SSE registers in another thread and get the same 12 RTTs. I haven’t tested this option, but I think 12 RTT code will work more efficiently on AMD. Those interested can try it; test programs can be adjusted to work on “Bulldozers” quite easily.

Who needs it?

A serious question, but with a simple answer - everyone needs this. Soon we will all be hooked on the clouds, we will store both data and programs there, and there, oh, how we want to create our own, private corner. To do this, you will have to encrypt the traffic, and the speed of crypto conversion will be the main determining factor of comfortable work in the cloud. Our choice of encryption algorithm is small - either GOST or AES.

Moreover, oddly enough, the AES encryption algorithm built into processors turns out to be much slower; tests show speeds of 100–150 megabytes per second, and this is with a hardware implementation of the algorithm! The problem lies in the single-threaded count and the replacement block, which operates on bytes (a table of 256 rows). So GOST turns out to be more effective when implemented on x86/64 architecture, who would have thought...

This is if we talk about the achieved level of encryption speed. And if we keep in mind theoretical refinements in the field of increasing code efficiency, then most likely no one needs this. There are practically no specialists at the 3–6 RTT level, compilers generally generate code at the 1–2.5 RTT level, and the majority of programmers do not know assembler, and even if they know its spelling, they do not understand the design of a modern processor. And without this knowledge, it doesn’t matter whether it’s an assembler or some kind of SI-sharp.

But not everything is so sad: the bottom line after a week of sleepless nights is a new algorithm for implementing GOST, which would be a sin not to patent. And applications for patents (three in total) have already been completed and filed, so, gentlemen, businessmen, line up - women and children get a discount.

Encryption algorithm GOST 28147-89. Simple replacement method. — Archive WASM.RU

“While you are alive, don’t die, look at this world.
Many here have a dead soul - they are dead inside.
But they walk and laugh, not knowing that they are not there,
Don’t rush your death hour,” she told me.
Aria, "Up There"
2.1 Feistel networks.
2.2 Block cipher GOST 28147-89

3.1 Key information
3.2 Basic step of crypto conversion
3.3 Basic cycles:“ 32-З” , “ 32-P” .

4.1 Implementation of the main crypto conversion step
4.2 Increasing the speed of the algorithm
5. Key information requirements
6. List of used literature
7. Acknowledgments

Introduction.

This document is my attempt to describe a method for simply replacing the GOST 28147-89 encryption algorithm with the simplest, but nevertheless technically competent language. The reader will tell his opinion about how well I succeeded after reading the first six points.

In order for my work to be more useful, I recommend arming yourself with the works of the authors indicated in the list of used literature. A calculator is also recommended so that it has a function for calculating the operation XOR, because reading the article assumes that the reader intends to study this encryption algorithm. Although it is also suitable as a reference guide, I wrote this article specifically as a training one.

Introduction to block ciphers.

Before we begin to look at the algorithm, we need to familiarize ourselves with the history of the creation of this kind of ciphers. The algorithm belongs to the category of block ciphers, in the architecture of which information is divided into a finite number of blocks, the final one naturally may not be complete. The encryption process occurs precisely over complete blocks, which form the ciphergram. The final block, if it is incomplete, is supplemented with something (I will talk about the nuances of its completion below) and is encrypted in the same way as complete blocks. By ciphergram I mean the result of the encryption function acting on a certain amount of data that the user submitted for encryption. In other words, a ciphergram is the final result of encryption.

The history of the development of block ciphers is associated with the early 70s, when the IBM company, realizing the need to protect information when transmitting data over computer communication channels, began to implement its own scientific research program devoted to the protection of information in electronic networks, including cryptography.

A group of researchers and developers from IBM, which began researching encryption systems with a symmetric key scheme, was led by Dr. Horst Feistel.

2.1 Feistel networks

The architecture of the new encryption method proposed by Feistel in classical literature was called “Feistel Architecture”, but at the moment in Russian and foreign literature a more established term is used - “Feistel’s network” or Feistel`s NetWork. Subsequently, the “Lucifer” cipher was built using this architecture - which was later published and caused a new wave of interest in cryptography in general.

The idea of the Feistel network architecture is as follows: the input information stream is divided into blocks of n bits in size, where n is an even number. Each block is divided into two parts - L and R, then these parts are fed into an iterative block cipher, in which the result of the j-th stage is determined by the result of the previous stage j-1! This can be illustrated with an example:

Rice. 1

Where, function A is the main action of a block cipher. It can be a simple action, such as the XOR operation, or it can have a more complex form and be a sequence of a number of simple actions - modulo addition, left shift, element replacement, etc., together these simple actions form the so-called main step of crypto conversion.

It should be noted that the key elements of the function’s operation are the supply of key elements and the XOR operation, and how well these operations are thought out indicates the cryptographic strength of the cipher as a whole.

In order for the idea of Feistel networks to be completely clear, let us consider the simplest case depicted in rice. 1, where in function A – the operations “mod 2” (“xor”) will appear, but this simplest case, in a more serious situation, for example, hiding information of national importance, function A can be more complex (as far as I have seen, function A can indeed be very complex):

Initial data:

L=1110b, R=0101, K=1111b

Get the encryption code

1. (R + K) mod 2 4 = Smod, Smod = 0100b

2. (Smod + L) mod 2 = Sxor, Sxor = 1010b

3. L=R, R=Sxor

L=0101b, R=1010b

Let's explain our actions:

1. This operation is addition mod 2 4. In practice, such an operation comes down to simple addition, where we must add two numbers and ignore the carry into the 5th digit. Since, if we put exponents above the digits of the binary representation of a number, there will be an exponent four above the fifth digit, let’s take a look at the figure below, which shows the actions of our operation:

Rice. 2

Here I pointed with an arrow to the exponents, as you can see, the result should have been 10100, but since the mod 2 4 operation ignores the carry, we get 0100.

2. This operation is called mod 2 in the literature; it is implemented in assembly language by the command XOR. But its more correct name is mod 2 1. Without this unique operation, it is hardly possible to build a fast, easily implemented encryption algorithm and, at the same time, have it be quite crypto-resistant. The uniqueness of this operation lies in the fact that it is the opposite of itself! For example, if the number A is XORED with the number B, the result is B. In the future, it is enough to XOR the numbers B and C with each other to get the previous value of A!

In this operation we got 1010 with the numbers 1110 and 0100, to get 1110 back, it’s enough to XOR the numbers 0100 and 1010 with each other! You can read more about this operation in the article attached to the website. www.wasm.ru, « Basic Guide toCRC_error detection algorithms» the author who Ross N. Williams. In this work there is a point - “ 5. Binary arithmetic without carries" It is in this article that the operation is described. xor! I exclaim because in this article this operation is so described that the reader not only understands how this operation works, he even begins it see, hear and feel!

3. This action is necessary so that when decrypting the ciphergram, the original values can be obtained.

2.2 Block cipher GOST 28147-89

The GOST 28147 – 89 encryption algorithm belongs to the category of block ciphers working according to the architecture of balanced Feistel networks, where two parts of the selected block of information are of equal size. The algorithm was developed in the bowels of the eighth department of the KGB, now transformed into FAPSI, and was established as the encryption standard of the Russian Federation back in 1989 under the USSR.

For this algorithm method to work, it is necessary to divide the information into blocks of 64 bits in size. Generate or enter into the encryption system the following key information: key and substitution table. The choice of key and replacement table when encrypting should be taken very seriously, because This is the foundation for the security of your information. For information on what requirements are imposed on the key and the replacement table, see paragraph “Requirements for key information.”

When considering the method, we will not focus on this, because this article, as I said above, was written with the goal of teaching the reader how to encrypt data using the method of simply replacing a given encryption algorithm, but we will definitely touch on this issue at the end of the article.

Theoretical minimum.

3.1 Key information

As I said above, the following take an active part in data encryption:

3.1.1. A key is a sequence of eight elements, each 32 bits in size. Further we will denote it by the symbol K, and the elements of which it consists are k1, k2, k3, k4, k5, k6, k7, k8.

3.1.2 Replacement table – a matrix of eight rows and sixteen columns, hereinafter referred to as Hij. Each element at the intersection of row i and column j occupies 4 bits.

3.2 Basic step of crypto conversion

The main action in the encryption process is the main step of crypto conversion. This is nothing more than the action of encrypting data using a certain algorithm, only the developers introduced a painfully cumbersome name.

Before encryption begins, the block is divided into two parts L and R, 32 bits each. They select a key element and only then feed these two parts of the block, the key element and the replacement table into the main step function. The result of the main step is one iteration of the basic cycle, which will be discussed in the next paragraph. The main step consists of the following:

The addition part of the block R is summed with the key element K mod 2 32. I described a similar operation above, here the same thing only the exponent is not “4”, but “32” - the result of this operation will be further denoted by Smod.

We divide the previously obtained result Smod into four-bit elements s7, s6, s5, s4, s3, s2, s1, s0 and feed it into the replacement function. The replacement occurs as follows: select the element Smod - s i , start from the beginning with the lowest element, and replace it with the value from the replacement table along the i - row and column indicated by the value of the element s i . We go to s i +1 element and proceed in the same way and continue until we replace the value of the last element Smod - the result of this operation will be denoted as, Ssimple.

In this operation, the value of Ssimple is shifted cyclically to the left by 11 bits and we obtain Srol.

We select the second part of block L and add mod 2 with Srol, as a result we have Sxor.

At this stage, the L part of the block becomes equal to the value of the R part, and the R part in turn is initialized by the result of Sxor and this is the end of the main step function!

3.3 Basic cycles: “32-Z”, “32-R”.

In order to encrypt information, it must be divided into blocks of 64 bits in size; naturally, the last block can be less than 64 bits. This fact is the Achilles heel of this “simple replacement” method. Since its addition to 64 bits is a very important task in increasing the cryptographic strength of the ciphergram, this sensitive place, if it is present in the array of information, and it may not be (for example, a file of 512 bytes in size!), should be treated with great care responsibility!

After you have divided the information into blocks, you should break the key into elements:

K = k1,k2,k3,k4,k5,k6,k7,k8

Encryption itself consists of using so-called basic cycles. Which in turn include the nth number of basic crypto conversion steps.

Basic cycles are, how can I put it, marked: n – m. Where n is the number of main crypto conversion steps in the base cycle, and m is the “type” of the base cycle, i.e. What are we talking about, “Z” encryption or “R” encryption of data.

The basic encryption cycle 32-3 consists of 32 main crypto-transformation steps. The function that implements the step actions is supplied with block N and key element K, where the first step occurs with k1, the second above the obtained result with element k2, etc. according to the following scheme:

k1,k2,k3,k4,k5,k6,k7,k8,k1,k2,k3,k4,k5,k6,k7,k8,k1,k2,k3,k4,k5,k6,k7,k8k8,k7, k6,k5,k4,k3,k2,k1

The decryption process for 32-P occurs in a similar way, but the key elements are supplied in the reverse order:

k1,k2,k3,k4,k5,k6,k7,k8,k8,k7,k6,k5,k4,k3,k2,k1,k8,k7,k6,k5,k4,k3,k2,k1,k8, k7,k6,k5,k4,k3,k2,k1

Practice.

4.1 Implementation of the main step of crypto conversion

After we got acquainted with the theory of how to encrypt information, it was time to see how encryption happens in practice.

Initial data:

Let's take the block of information N = 0102030405060708h, here parts L and R are equal:

L = 01020304h, R =05060708h, take the key:

K = ' as28 zw37 q839 7342ui23 8e2t wqm2 ewp1’ (these are ASCII codes, in order to view the hexadecimal representation, you can open this file in viewing mode in Total Commander by pressing the “ F3" and then the key " 3 "). In this key, the element values will be:

k1 = 'as28', k2 = 'zw37', k3 = 'q839', k4 = '7342'

k5 = 'ui23', k6 = '8e2t', k7 = 'wqm2', k8 = 'ewp1'

Let's also take the following replacement table:

Rice. 3

Here the rows are numbered from 0 to 7, columns from 0 to F.

Warning: All information, including the key with the replacement table, is taken as an example for considering the algorithm!

Using “Input Data”, it is necessary to obtain the result of the main step of crypto-conversion.

1. Select part R = 05060708h and key element k1 = ‘as28’, in hexadecimal form the key element will look like this: 61733238h. Now we perform the summation operation mod 2 32:

Rice. 4

As you can see in the figure, we did not have a transfer to 33 bits marked in red and with the exponent “ 32 " And if we had other values for R and the key element, this could very well happen, and then we would ignore it, and in the future we would only use the bits marked in yellow.

I perform this operation using the assembler command add:

; eax = R, ebx = ‘as28’

The result of this operation is Smod = 66793940h

2. Now this is the most tricky operation, but if you take a closer look, it is no longer as scary as it seems at first. Let's imagine Smod in the following form:

Rice. 5

I tried to visualize the Smod elements in the figure, but I’ll explain anyway:

s0 = 0, s1 = 4, s2 = 9, etc.

Now, starting from the lowest element s0, we make the replacement. Remembering the point “ 3.2 Basic step of crypto conversion» i – row, s i – column, look for the value in the zero row and zero column:

Fig.6

So the current value of Smod is not 6679394 0 h, a 6679394 5 h.

Let's start replacing s1, i.e. four. Using the first row and fourth column (s1= 4!). Let's look at the picture:

Rice. 7

Now the value is Smod, not 667939 4 5h, 667939 2 5h. I assume that the replacement algorithm is now clear to the reader, and I can say that after this the final result of Ssimple will have the following value - 11e10325h.

I will talk about how this can be most easily implemented in the form of assembler commands later in the next paragraph, after I talk about the extended table.

We must shift the resulting Ssimple value 11 bits to the left.

Rice. 8

As you can see, this action is quite simple, and is implemented with one assembly language command - rol and the result of this operation Srol is 0819288Fh.

4. Now it remains to XOR part L of our information block with the value Srol. I take a calculator from w2k sp4 and get Sxor = 091b2b8bh.

5. This action is final and we simply assign, clean R, the value of part L, and initialize part L with the value of Sxor.

Final result:

L = 091b2b8bh, R = 01020304h

4.2 Increase the speed of the algorithm

Now let's talk about optimizing the algorithm for speed. When implementing any project, you have to take into account that a program that works with registers more often than with memory works the fastest, and here this judgment is also very important, because over one block of information there are as many as 32 encryption actions!

When I implemented the encryption algorithm in my program, I did the following:

1. Selected part of the block L into the register eax, and R into edx.

2. The esi register was initialized with the address of the extended key, more on that below.

3. I assigned the value of the address of the extended replacement table to the ebx register, more on that below

4. Transferred the information of points 1,2, 3 to the function of the basic cycle 32 - Z or 32 - R, depending on the situation.

If you look at the supply diagram of the key elements in paragraph “ Basic cycles: “32-Z”, “32-R”", then our key for the basic cycle 32 - Z can be represented as follows:

K 32-Z =

'as28','zw37','q839','7342','ui23','8e2t','wqm2','ewp1',

'as28','zw37','q839','7342','ui23','8e2t','wqm2','ewp1',

'ewp1','wqm2','8e2t','ui23','7342','q839','zw37','as28'

Those. from the beginning there are k1,k2,k3,k4,k5,k6,k7,k8 - ‘ as28’, ‘zw37’, ‘q839', '7342', 'ui23', '8e2t', 'wqm2’, ‘ewp1' This sequence is repeated three times. Then the elements go in reverse order, i.e.: k8,k7,k6,k5,k4,k3,k2,k1 - 'ewp1', 'wqm2', '8e2t', 'ui23', '7342', 'q839', 'zw37', 'as28'.

I pre-arranged the elements in the array in the order they should be fed into 32 - Z. Thus, I increased the memory required per turn, but saved myself from some thinking processes that I did not need, and increased the speed of the algorithm, for by reducing memory access time! Here I described only the key for 32 - Z, for cycle 32 - P I did the same, but using a different scheme for supplying elements, which I also described in paragraph “ Basic cycles: “32-З”, “32-Р”».

It's time to describe the implementation of the replacement function, as I promised above. I couldn't describe it earlier, because... this requires the introduction of a new concept - an extended substitution table. I can't explain to you what it is. Instead, I’ll show it to you, and you can formulate for yourself what it is – an extended substitution table?

So, in order to understand what an extended replacement table is, we need a replacement table; for example, I’ll take the one shown in Fig. 3.

For example, we needed to replace the number 66793940h. I will present it in the following form:

Rice. 9

Now if we take the elements s1, s0, i.e. low byte, then the result of the replacement function will be equal to 25h! After reading the article by Andrei Vinokurov, which I cited in paragraph “ List of used literature", you will indeed find that if you take two strings you can get an array, allowing you to quickly find replacement elements using an assembler command xlat. They say there is another faster way, but Andrei Vinokurov spent about four years researching fast algorithms for implementing GOST! I think there is no need to reinvent the wheel when it already exists.

So, about the array:

Let's take the first two lines, zero and first, and create an array of 256 bytes. Now we observe one peculiarity: if we need to convert 00h, then the result will be 75h (we rely on Fig. 3) - we put this value in the array at offset 00h. We take the value 01h, the result of the replacement function 79h, put it in the array at offset 01 and so on until 0FFh, which will give us 0FCh, which we put in the array at offset 0FFh. So we got an extended table of replacements for the first group of rows: first and zero. But there are still three groups: second page 2, page 3, third page 4, page 5, fourth page 6, page 7. We deal with these three groups in the same way as with the first. The result is an extended substitution table!

Now you can implement an algorithm that will perform the replacement. To do this, we take the source codes that Andrey Vinokurov posted on his page, see “ Bibliography».

lea ebx,extended_table_simple

mov eax,[put the number to be replaced]

add ebx,100h ;jump to next two nodes

sub ebx,300h ; so that in the future ebx will point to the table

Now there’s one more feature: with the previous actions we not only replaced, but also shifted the number 8 bits to the left! All we have to do is shift the number another 3 bits to the left:

and we get the result of the operation rol eax,11!

I can’t add anything more regarding optimization, the only thing I can emphasize is what I said above - use registers more often than accessing memory. I think these words are only for beginners; experienced people understand this perfectly well without my words.

Requirements for key information.

As stated in the article by Andrey Vinokurov, the key is selected according to two criteria:

Criterion for the equiprobable distribution of bits between values 1 and 0. Typically, the Pearson criterion (“chi-square”) is used as a criterion for the equiprobable distribution of bits.

This means the key, in principle, can be any number. That is, when the next bit of the key is formed, the probability of its initialization to one or zero is 50/50!

Please note that the key consists of eight elements, each 32 bits, so in total there are 32 * 8 = 256 bits in the key and the number of possible keys is 2,256! Doesn't this amaze you?

Series criterion.

If we look at our key, which I provided in paragraph “ 4.1 Implementation of the main step of crypto conversion", then you will notice that the following notation is true:

Rice. 10

In one phrase, the value of k 1 should not be repeated in k 2 or in any other key element.

That is, the key that we chose to consider as an encryption algorithm fully meets the two criteria above.

Now about choosing a replacement table:

Now let's talk about how to choose the right substitution table. The main requirement for choosing replacement tables is the phenomenon of “non-repetition” of elements, each of which is 4 bits in size. As you have already seen above, each row of the replacement table consists of the values 0h, 1h, 2h, 3h, ..., 0fh. So the main requirement states that each line contains the values 0h, 1h, 2h, ..., 0fh and each such value in one copy. For example, the sequence:

1 2 3 4 5 6 7 8 9 A B C D E F

It fully meets this requirement, but still! It is not recommended to select such a sequence as a string. Because if you feed a value to the input of a function that relies on such a string, then you will get the same value as the output! Don't believe me? Then take the number 332DA43Fh and eight such rows as a table of replacements. Perform the replacement operation, and I assure you, the output you will get is 332DA43Fh! That is, the same as what you submitted as input to the operation! But this is not a sign of good manners when encrypting, and was it?

This was one requirement, the next criterion says that - every bit of the output block must be statistically independent of every bit of the input block!

How does this look simpler? And here is how, for example, we selected the element s0 = 0Fh, 01111b from the above number. The probability that we will now replace the first bit with one or zero is 0.5! The probability of replacing the second, third and fourth bits, each bit, we consider separately, with ones or zeros is also 0.5. When choosing s1 = 0Eh, the probability that we replace the zero bit, and this is “0”, with zero or one too equal to – 0.5! Thus, according to this criterion, there is no pattern between the replacement of zero bits of elements s0, s1! Yes, you could substitute ones, but you could also substitute zeros.

To evaluate the table according to this criterion, you can build a table of correlation coefficients calculated using the formula:

If p = 1, then the value of bit j at the output is equal to the value of bit i at the input for any combination of bits at the input;

If p = -1, then the value of output bit j is always the inverse of input bit i;

If p = 0, then the output bit j is equally likely to take the values 0 and 1 for any fixed value of the input bit i.

Let's take a single line example:

Let’s break it down into “components”:

Let's calculate one coefficient using the formula given above. To make it easier to understand how this is done, I will explain in more detail:

We take the 0th bit of the 0th number (0) at the input and the 0th bit of the 0th number at the output (1) and perform the operation 0 XOR 1 = 1.

We take the 0th bit of the 1st number (1) at the input and the 0th bit of the 1st number at the output (1) and perform the operation 1 XOR 1 = 0.

We take the 0th bit of the 2nd number (0) at the input and the 0th bit of the 2nd number at the output (0) and perform the operation 0 XOR 0 = 0.

We take the 0th bit of the 3rd number (1) at the input and the 0th bit of the 3rd number at the output (1) and perform the operation 1 XOR 1 = 0.

Having carried out sequential XOR operations in this sequence, we count the number of all non-zero values, we get the value 6. Hence P 00 = 1-(6/2 4-1) = 0.25. So, it turned out that the value of bit 0 at the output is equal to the value of bit 0 at the input in 4 cases out of 16;

Final table of odds:

As can be seen from the table of correlation coefficients, bit 3 at the input is inverted relative to bit 0 at the output in 14 cases out of 16, which is 87.5%. This is no longer acceptable for normal encryption systems. Let's take another example for variety:

The table of coefficients will be as follows (who is not too lazy can recalculate)

Well, in this table things are even worse - bits 1 and 2 of group remain unchanged! The cryptanalyst has room to turn around. Taking into account all these requirements, by simple search ("brute force"), permutation tables were found corresponding to the specified theory (to date - 1276 combinations). Here are some of them:

09 0D 03 0E-06 02 05 08-0A 07 00 04-0C 01 0F 0B

00 05 0A 07-03 08 0F 0C-0E 0B 04 09-0D 06 01 02

06 0B 0F 00-0C 01 02 0D-08 07 09 04-05 0A 03 0E

04 0E 00 09-0B 01 0F 06-03 0D 07 0A-0C 02 08 05

04 02 08 0E-05 0F 03 09-0B 01 0D 07-0A 0C 06 00

07 03 09 0C-08 00 06 0F-0E 04 01 0A-0D 0B 02 05

06 0F 03 08-0D 04 0A 01-09 02 05 0C-00 0B 0E 07

0C 06 08 01-03 09 07 0E-0B 05 0F 02-04 0A 00 0D

04 0B 09 06-0E 01 00 0F-0A 05 03 0C-0D 02 07 08

00 0E 0F 01-07 08 09 06-04 0B 0A 05-03 0D 0C 02

0F 09 01 07-04 0A 08 06-0E 00 02 0C-05 03 0B 0D

0A 03 04 01-05 0C 0B 0E-08 06 0F 0D-07 09 00 02

0B 06 0F 01-04 0A 08 05-00 0D 0C 02-07 09 03 0E

0C 03 02 08-0D 06 0B 05-07 09 04 0F-0A 00 01 0E

02 0B 0F 04-09 00 06 0D-05 0E 01 08-0C 07 0A 03

List of used literature.

Article by Andrey Vinokurov:

GOST 28147-89 encryption algorithm, its use and implementation

for computers on the Intel x86 platform.

Here are the source codes for implementing the encryption algorithm.

Article by Horst Feistel:

Cryptography and Computer Security.

(can be found at the same address as the previous article)

Ross N. Williams:

A Basic Guide to CRC Error Detection Algorithms
Posted on the website www.wasm.ru.

Acknowledgments

I would like to express my gratitude to all visitors to the forum www.wasm.ru. But I would especially like to thank ChS, who is currently known as SteelRat, he helped me understand things that I probably would never have understood, as well as help in writing the paragraph: “ Key information requirements", the main part of this paragraph was written by him. I am also deeply grateful to the employee of KSTU. A.N. Tupolev Anikin Igor Vyacheslavovich and it would be a sin not to mention Chris Kaspersky for what he is and Volodya / wasm.ru for his instructions. Oh, and I get it from him. I also want to mention Sega-Zero / Callipso for bringing some mathematical jungle to my mind.

That's probably all I would like to tell you.

I would be grateful for criticism or questions related to this article or just advice. My contact details: [email protected], ICQ – 337310594.

Best regards, Evil`s Interrupt.

P.S.: I didn’t try to outdo anyone with this article. It was written with the intention of making it easier to study GOST, and if you have difficulties, this does not mean that I am to blame for this. Be reasonable and be patient, all the best to you!

GOST 28147 89 gamming mode. Easy replacement mode

Nicolas Courtois - "the great and terrible"

Algebraic method

Differential method

Attacks by Isobe and Dinur-Dankelman-Shamir

Attacks assuming the attacker knows "something" about the keys

Dry residue: what is durability in practice?

Algorithm GOST 28147-89 and code “Magma” (GOST R 34.12-2015)

^0 = (^32^3.....

*1 =(^64^63, . ^34^33)

Traditional implementation of GOST 28147-89

Multithreaded implementation of GOST 28147-89

Using SSE registers and AVX commands of modern processors to implement GOST 28147-89

Price issue

Mixed mode

Who needs it?

Encryption algorithm GOST 28147-89. Simple replacement method. — Archive WASM.RU

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts