Encryption according to GOST 28147 89. Non-standard use of the standard

Facebook

1 Block diagram of the cryptographic transformation algorithm 1

2 Easy replacement mode 4

3 Gamma mode 8

4 Gamma mode with feedback 11

5 Simulation insert generation mode 14

Appendix 1 Terms used in this standard and their definitions 16

Appendix 2 Values of constants C1, C2 18

Appendix 3 Schemes for software implementation of the cryptographic algorithm

transformations. 19

Appendix 4 Rules for summation modulo 2 32 and modulo (2 32 -I) 25

STATE STANDARD

USSR UNION

INFORMATION PROCESSING SYSTEMS. CRYPTOGRAPHIC SECURITY

Cryptographic conversion algorithm

Date of introduction 07/01/90

This standard establishes a unified cryptographic transformation algorithm for information processing systems in networks of electronic computers (computers), individual computing systems and computers, which defines the rules for data encryption and the development of imitations.

The cryptographic transformation algorithm is designed for hardware or software implementation, satisfies cryptographic requirements and, in its capabilities, does not impose restrictions on the degree of secrecy of the protected information.

The standard is mandatory for organizations, enterprises and institutions that use cryptographic protection of data stored and transmitted in computer networks, in separate computer systems or in computers.

The terms used in this standard and their definitions are given in Appendix 1.

I. BLOCK DIAGRAM OF THE CRYPTOGRAPHIC TRANSFORMATION ALGORITHM

1.1. The block diagram of the cryptographic transformation algorithm (crypto-scheme) contains (see Figure 1):

Official publication ★

a 256-bit key storage device (KSD), consisting of eight 32-bit drives (X0, Xt. X2, A3 L4, X$, X6, Xu); four 32-bit drives (/V (, N 2, Nj, /V 4);

Reproduction is prohibited

two 32-bit drives L/$,) with permanent fillings C 2, C\\ recorded in them

two 32-bit modulo 2 32 adders (SM|, SL/3);

32-bit adder of bitwise summation modulo 2 (SL/2);

32-bit modulo adder (2 32 - 1) (SL/ 4);

modulo 2(SL/5) adder, there is no restriction on the capacity of the SL/$ adder;

substitution block (A);

cyclic shift register eleven steps towards the most significant digit (R).

1.2. The substitution block A" consists of eight substitution nodes A'j,

A 2, A“z, K 4, A5, A7, A 8 with 64-bit memory each. Post

A 32-bit vector applied to a substitution block is split into eight sequential 4-bit vectors, each of which is converted into a 4-bit vector by the corresponding substitution node, which is a table of sixteen rows containing four padding bits per row. The input vector determines the address of a row in the table, the filling of this row is the output vector. The 4-bit output vectors are then concatenated sequentially into a 32-bit vector.

1.3. When adding and cyclically shifting binary vectors, the most significant bits are considered to be the bits of the storage devices with large numbers.

1.4. When writing the key (I", W 2 ..., W q e (0,1), d = N256, in

The KZU value W\ is entered into the i-th digit of the drive Xq, the value W 2 is entered into the 2nd digit of the drive L#, ..., the value W^ 2 is entered into the 32nd digit of the drive Xq; the value W33 is entered into the 1st digit of the drive X\ y, the value is entered into the 2nd digit of the drive X\ y..., the value W M is entered into the 32nd digit of the drive X\\ the value W 6 5 is entered into the 1st digit of the drive X 2 etc., the value 1U 2 5b is entered into the 32nd bit of the Xy drive.

1.5. When rewriting information, the contents of the p-th digit of one drive (adder) are rewritten into the p-th digit of another drive (adder).

1.6. The values of constant fillings Cj, C 2 (constants) of storage devices /V 6, /V5 are given in Appendix 2.

1.7. The keys that determine the filling of the KZU and tables of the substitution block K are secret elements and are supplied in the prescribed manner.

Filling the tables of the substitution block K is a long-term key element common to the computer network.

The organization of various types of communication is achieved by building an appropriate key system. In this case, the possibility of generating keys (filling the KZU) in a simple replacement mode and encrypting them in a simple replacement mode while providing imitational protection for transmission over communication channels or storage in computer memory can be used.

1.8. The crypto scheme provides for four types of work: encryption (decryption) of data in simple replacement mode; encryption (decryption) of data in gamma mode;

encryption (decryption) of data in gamma mode with feedback;

simulation insert generation mode.

Schemes for software implementation of the cryptographic transformation algorithm are given in Appendix 3.

2. EASY CHANGE MODE

2.1. Encrypting clear data using simple replacement mode

2.1.1. The cryptoscheme that implements the encryption algorithm in simple replacement mode should have the form shown in Figure 2.

Open data to be encrypted is divided into blocks of 64 bits each. Input of any block T () = (D|(0), ^(O), ..., d 3 1(0), i 32 (0), £|(0), b 2 (0) y.. . , Z> 32 (0)) of binary information into drives N\ and N 2 is carried out so that the value D|(0) is entered into the 1st bit of N|, the value a 2 (0) is entered into the 2nd bit / Vj, etc., the value i 32 (0) is entered into the 32nd digit iVj; value />|(0) is entered into

1st digit L/ 2, the value b 2 (0) is entered into the 2nd digit N 2, etc., the value >> 32 (0) is entered into the 32nd digit N 2. As a result, the state is obtained (i 32 (0), i 3 |(0), ..., and 2 (0) y<7|(0)) накопителя yVj и состояние (/>32 (0), b 2 1(0), ... , />|(0)) storage unit N 2.

2.1.2. 256 bits of the key are entered into the RCU. The contents of eight 32-bit drives Aq, X\ t ..., Xj have the form:

^0 = (^32^3.....

*1 =(^64^63, . ^34^33)

*7 = (^56> ^255. ... , I/ 226, ^ 225)

2.1.3. The encryption algorithm for a 64-bit block of plain data in simple replacement mode consists of 32 cycles.

In the first cycle, the initial filling of the storage is summed modulo 2 32 in the adder CM\ with the filling of the storage Xq, while the filling of the storage Nj is maintained.

The result of the summation is converted in the substitution block K and the resulting vector is sent to the input of the register /?, where it is cyclically shifted by eleven steps towards the most significant digits. The result of the shift is summed bit by bit modulo 2 in the adder CM 2 with 32-bit filling of the drive yV 2 . The result obtained in CM 2 is written in N\%, with the old filling N| rewritten to N 2. The first cycle ends.

Subsequent cycles are carried out similarly, with

In the 2nd cycle, the filling X\ is read from the RCU, in the 3rd cycle from the RCU

the filling X2 is read, etc., in the 8th cycle the filling Xj is read from the RCU. In cycles from 9 to 16, as well as in cycles from 17 to 24, fillings from the RCU are read in the same order:

In the last eight cycles from the 25th to the 32nd, the order of reading the RCD fills is reversed:

hell, hell, hell, hell.

Thus, when encrypting in 32 cycles, the following order of selecting storage fills is carried out:

hell, ^2,^),^4>^5,^6"^7, hell, ^2,^3"^4,^5,-^6,^7, hell, hell, hell, hell, hell ,hell,hell,hell.

In the 32nd cycle, the result from the adder SL/ 2 is entered into the drive CU 2, and the old filling is stored in the drive N\.

Received after the 32nd cycle of encryption of filling the N| and N2 are the encrypted data block corresponding to the plain data block.

2.1 4 Encryption equations in simple replacement mode have the form:

J*Cr> "(

I b(/) = a(/~ I)

at y = I -24;

\bO) - a O - O at / 8* 25 -g 31; a(32) = a(31)

A (32) = (d (31) ffl X 0)KRG> b (31)

where d(0) = (a 32 (0), «з|(0), ... , Д|(0)) is the initial filling of N\ before the first encryption cycle;

6(0) = (632(0), 63j(0), ... , 6j(0)) - initial filling /U 2 before the first encryption cycle;

a(j) = (032(7), 0з|(/) e... , 0|(/)) - filling of the control unit, after the y-th encryption cycle;

b(j) = (6з 2 (/), 63j(/"), ... , 6|(/)) - filling /V 2 after the yth encryption cycle, y = 032.

The sign φ means the bitwise summation of 32-bit vectors modulo 2.

The sign Ш means the summation of 32-bit vectors modulo 2 32. The rules for summation modulo 2 32 are given in Appendix 4;

/? - the operation of a cyclic shift of eleven steps towards the highest digits, i.e.

^(g 32"O|>g 30>g 29>g 28>g 27>g 26"g 25>g 24>g 23'G 22"G 2b G 20>"g 2* g |)~

= (g 21" g 20> - "g 2* g 1 * G 32>G31 *GzO" g 29* g 28*, 27e"26e/"25>, 24>G23", 22)*

2.1.5. The 64-bit block of encrypted data Tsh is output from drives L^, VU 2 in the following order: from the 1st, 2nd, ..., 32nd bits of drive L7|, then from the 1st, 2nd , ... , 32nd bit of storage W 2, i.e.

t w - (a,<32),0 2 (32),0 32 (32), 6,(32), 6 2 <32),6 32 <32».

The remaining blocks of open data in simple replacement mode are encrypted in the same way.

2.2. Decrypting encrypted data using easy replacement mode

2.2.1. The cryptoscheme that implements the decryption algorithm in simple replacement mode has the same form (see Chsrt.2) as for encryption. 256 bits of the same key used for encryption are entered into the RCU. The encrypted data to be decrypted is divided into blocks of 64 bits each. Enter any block

T w - (0,(32),o 2 (32), ..., 0 32 (32), 6,(32), 6 2 (32), ..., 6 32 (32))

into accumulators L', and N 2 are produced so that the value dj(32) is entered into the 1st digit /V, the value o 2 (32) is entered into the 2nd digit /V, etc., the value a 32 (32) is entered into the 32nd digit /V,; the value 6,(32) is entered into the 1st digit of N2, etc., the value 6 32 (32) is entered into the 32nd digit of N2.

2.2.2. Decryption is carried out according to the same algorithm as the encryption of open data, with the difference that the fillings of the drives Xq, X\y..., Xj are read from the RCU in decryption cycles in the following order:

hell, hell 3, hell, hell, hell, hell, hell, hell 0,

hell 6, hell 4, hell 2, hell, hell, hell, hell 2, hell.

2.2.3. The decryption equations look like:

G d (32 - /) = (d (32 - / + 1) SHLG,.,) *LF6(32 - / + 1) b (32 - /) = d (32 - / + 1) at, / = 1+8;

I o(32- /) = (a(32-/M)SHDG (32 _ /)(tod8))KLFL(32./M) |6(32-/) = d (32 - / + 1)

at /= 9 + 31;

b(0) = (a (1) ШДГо) ОФй(1)

2.2.4. The W and N 2 drives obtained after 32 cycles of operation constitute a block of open data.

Then = (fli(O), a 2 (0), ... , Az 2 (0)" 6, (0), 6 2 (0), ... , 6 32 (0)), corresponding to the block of encrypted data, while the value o,(0) of block 7о corresponds to the contents of the 1st bit yV, the value 02(0) corresponds to

P. 8 GOST 28147-89

corresponds to the contents of the 2nd bit N\, etc., the value Dz2(0) corresponds to the contents of the 32nd bit N\; the value 6j(0) corresponds to the contents of the 1st bit; the value ^(0) corresponds to the contents of the 2nd bit N2, etc., the value £зг(0) corresponds to the contents of the 32nd bit N2-

The remaining blocks of encrypted data are decrypted in the same way.

2.3. The encryption algorithm in the mode of simple replacement of the 64-bit block G 0 is denoted by A y i.e.

A (T 0) = A (a (0), b (0)) = (a (32), b (32)) = T w.

2.4. The simple replacement mode can be used to encrypt (decrypt) data only in the cases given in clause 1.7.

3. GAMING MODE

3.1. Encrypting open data in gamma mode

3.1.1. The cryptoscheme that implements the encryption algorithm in gamma mode has the form shown in Figure 3.

Open data, divided into 64-bit blocks T\)\ 7), 2) ..., 7)) m “, 1 7[) M), is encrypted in gamma mode by bitwise summation modulo 2 in the adder SL/5 with the cipher gamma Гш, which is produced in blocks of 64 bits, i.e.

G _/L1) R2) Lm-1) LM)\

"ill V 1 w e * w * » " Sh » " * * * " 111 /»

where M is determined by the volume of encrypted data.

Tjj) - Yth 64-bit block, /“ the number of binary digits in block 7J) M) can be less than 64, while the part of the cipher gamut not used for encryption from block Г\^] is discarded.

3.1.2. 256 bits of the key are entered into the RCU. A 64-bit binary sequence (sync message) S = (5*1, S 2, ..., 5^4) is introduced into the drives iVj, N 2, which is the initial filling of these drives for the subsequent generation of M-blocks of the cipher gamma. The sync message is entered into jV| and L^so that the value 5[ is entered into the 1st digit of VU), the value S 2 is entered into the 2nd digit N\, etc., the value ^is entered into the 32nd digit 7V|; the value S33 is entered into the 1st digit N2, the value 4S34 is entered into the 2nd digit N2, etc., the value is entered into the 32nd digit N2.

3.1.3. The initial filling of drives /Vj and N 2 (sync message.5) is encrypted in simple replacement mode in accordance with

The popularly known term “processor performance” is an objective, calculated parameter that is measured in flops. However, most people measure it in gigahertz, naively believing that they are the same thing. Nobody knows the term “code performance”, and I’ll immediately explain why.

The reason is that I only recently came up with it and haven’t told anyone about it yet. However, code performance, like processor performance, has objective characteristics that can be measured. This article is specifically about the performance of code executed by the processor core.

How is code performance measured? Since I was the first to talk about this, by right of the discoverer I will measure it in RTT units;).

Now seriously. In modern processors, the main transformations are operations on 32-bit numbers; everything else is, by and large, exotic. Therefore, we will take into account the main thing - operations with 32-bit numbers. How many 32-bit operations do you think a modern processor core can perform simultaneously?

The student will answer - one, his teacher will think and say that there are four, the professional - that there are only twelve operations so far.

So, a program code that loads all processor execution units simultaneously throughout the entire execution time of the code will have a performance of 12 RTTs. Maximum! To be honest, I have never written such code before, but in this article I will try to make an effort.

I will prove that code with simultaneous execution of twelve 32-bit operations is possible

Program code that uses one actuator in the processor core will naturally have a performance of 1 RTT. Programs generated by high-level language compilers and virtual machine interpreters can boast of such code performance. There is no need to assume that the processor load indicator, which can be seen in the OS task manager, can serve as an objective criterion for the efficiency of the code. The processor core load can be 100%, but the program code will use one execution unit in it (performance 1 RTT). In this case, at 100% load, the processor core will operate at 1/12 of its maximum performance. In other words, when the Windows Task Manager shows the maximum processor load, its actual performance may vary from 1 to 12 RTT. If you see 100% load on any processor core in the performance window, it is wrong to assume that all actuators are working in this core, not at all!

The only criterion for indirectly assessing the operation of a processor core at maximum performance can be its power consumption and, as a consequence, the noise of the cooler. Now, if the cooler is noisy, then yes, the load has gone to maximum. However, it’s time to finish with general concepts and move on to harsh practice.

Traditional implementation of GOST 28147-89

I am not a professional in the field of information security, but I am still familiar with the topic of encryption. I was inspired to study symmetric stream encryption specifically by conversations with a professional cryptographer whom I deeply respect. And, having taken up this topic, I tried to do it well, and not just well, but also quickly, performing the maximum number of operations per unit of time. In other words, I was faced with the task of writing program code with the maximum RTT value.

Cryptographic transformation in accordance with GOST 28147-89 is used for stream encryption of information in communication channels and on disk drives.

Currently, software implementation of this GOST on the RON of the central processor is widely used. In known methods of implementing GOST, all secret information (encryption keys, replacement blocks) is located in RAM. This reduces the reliability of encryption, since having a RAM dump can completely reveal all the secret elements of the crypto-transformation. In addition, the method has performance limitations due to the location of the main crypto conversion objects in the OP and incomplete loading of the ALU executive units. Modern processors, implementing the crypto procedure using a well-known method, can provide encryption speeds of 40–60 megabytes per second. And if we really understand it to the end, the reason for the low performance and weak security of crypto conversion is the software implementation of the substitution block. For its description in GOST, see Fig. 1.

According to clause 1.2 of GOST, this block implements tetrad (four bits) permutations in a 32-bit word, but the x86/64 processor architecture and its instruction system are not capable of effectively manipulating tetrads.

For the software implementation of the substitution block, special tables in RAM are used, prepared at the stage of initialization of the cryptofunction. These tables combine the replacement nodes of adjacent tetrads into 8 × 8-bit byte tables, thus placing four 256-byte tables in RAM.

In more advanced implementations, these tables are 1024 bytes in size (256 words of four bytes). This was done in order to implement in the tables an additional cyclic shift by 11 positions of the 32-bit word obtained as a result of substitution (the next operation of the conversion algorithm according to GOST). An example of GOST implementation using this method is shown in Appendix 1 (on disk).

The information of the substitution block is a secret component of the cryptofunction (as formulated in GOST, see Fig. 2).

Placing these tables with the keys of the substitution block in the OP contradicts the requirements of GOST (clause 1.7), since secret information becomes available to third-party programs running on the computing installation. The FSB, which also certifies software implementations of encryption according to GOST, looks at this violation, to put it mildly, condescendingly. If, in order to place keys in the OP, the FSB still requires a “fig leaf” - masking the keys using the XOR operation, then nothing is required for replacement blocks in the OP; they are stored in clear form.

In short, the FSB allows such software implementations of cryptoprocedures to pass, despite the obvious decrease in the security of such a solution and a direct violation of its own requirements according to GOST (clause 1.7). And this despite the well-known methods of breaking ciphers by taking a memory dump...

We will return to the issue of storing keys and replacement blocks in internal registers of the processor a little later (there is a beautiful and fast solution), but for now we will only store encryption keys in MMX registers, this is more reliable.

But enough of the lyrics, what is important within the framework of the topic under consideration is that this program code has a performance of 1 RTT. Now let's write code with the performance of 2 RTTs.

Multithreaded implementation of GOST 28147-89

The only way to speed up crypto procedures in the known algorithm is to introduce multithreading. The point of this change in the implementation of the algorithm is to calculate several blocks of data in parallel.

Most programmers mean by parallel processing exclusively the work of several processor cores, synchronized through interrupts and semaphores in memory.

However, there is another option for parallel data processing on a single processor core. Let me explain this non-obvious idea.

Modern processors include at least two, and even three to six arithmetic-logical units. These ALUs (FPUs, address arithmetic units, and so on) can operate independently of each other; the only condition for their parallel operation is that the software objects they operate on are disjoint. In other words, in instructions that simultaneously execute an ALU, the memory addresses and register numbers must be different. Or, no writes should be performed to shared registers and memory addresses accessed by various processor execution units.

The workload of all ALUs is controlled by a special hardware unit inside the processor core - the scheduler, which scans the executable code forward, to a depth of 32–64 bytes. If the scheduler discovers instructions that can be run on the ALU without conflicts, then it runs them simultaneously on different execution devices. In this case, the counter of executed commands indicates the executing command (there are several of them in such a scheme), after which all commands have already been executed.

Most program sequences generated automatically (by compilers) cannot load all the ALUs and FPUs located in the processor core. In this case, the processor hardware is idle, which significantly reduces its resulting performance. Processor developers understand this and introduce modes to increase core frequency when the hardware is not fully used. This is also what hypertrading systems are designed for, and I will use this system to “press” the code to the maximum in the future.

Compilers, even the most optimized ones, and even more so virtual machine engines, cannot generate optimized code in terms of performance. Only a programmer with engineering knowledge can write such optimized code, and the tool for writing it is exclusively assembler.

A typical illustration of the possibility of executing several independent program threads on one processor core is the GOST implementation, executed in two threads on a single processor core. The idea of the code is simple: there are two blocks of data to encrypt/decrypt, but one processor core that will perform the conversion. It is possible to perform the conversion on these two data blocks sequentially, and this has been done to this day. In this case, the time required to complete the transformations doubles.

But you can do it differently: alternate commands related to the processing of different data blocks. These options are presented graphically in Fig. 3.

In the figure, the top example shows the usual order of processing two independent blocks of data. The first block is processed first, then the processor proceeds to process the second block. Naturally, the resulting time is equal to twice the time required to process one block, and the actuators of the processor core are not fully loaded.

The following is an example of interleaving commands from different processing threads. In this case, commands related to different data blocks are interleaved. The scheduler selects instructions that are independent of each other and transmits them for execution to ALU1 and ALU2. The grouping of commands of the first and second threads on these ALUs is carried out automatically, since the scheduler operating algorithm includes a grouping of commands linked to common data on the same executive device.

In order for such program code to work without ALU downtime, it is necessary that each program thread work with its own set of registers. The cache in this scheme becomes a bottleneck (it has only two data output ports), so we store the keys in MMX registers. Since in this case the replacement (and shift) nodes in memory are only read, they can be common to both program threads.

This, of course, is a very simplified explanation of the principle of parallel execution of program threads on a single core; in reality, everything is much more complicated. In practice, you need to take into account the pipeline architecture of actuators, restrictions on simultaneous access to the cache and the RON register block, the presence of address arithmetic nodes, switches and much more... So this is a topic for professionals, who can be counted on the fingers... of one hand.

The parallel encryption method is effectively implemented only for the 64-bit processor operating mode, since in this mode there is a sufficient number of RON (as many as 16 pieces!). An example of GOST implementation using this method is shown in Appendix 2 (on disk).

It is clear that this GOST implementation has a code performance of 2 RTT codes. Now let's see how this affects execution time.

The encryption cycle for one stream (Appendix 1) is 352 clock cycles, and during this time 8 bytes of data are calculated; for a two-threaded implementation of GOST (Appendix 2) 416 processor cycles are required, but 16 bytes are calculated. Thus, the resulting conversion speed increases from 80 to 144 megabytes for a 3.6 GHz processor.

An interesting picture emerges: the code contains exactly twice as many commands, and executes only 15% longer, but I think readers have already understood the reason for this phenomenon...

Theoretically, the code from the second example should be executed in the same number of cycles as the code from the first example, but the scheduler node is developed by Intel engineers, but they are also human, and we are all far from perfect. So it is possible to evaluate the effectiveness of their creation. This code will also run on an AMD processor, and you can compare their results.

If anyone doesn’t take my word for it, then for such non-believers, test programs with clock counters are included on the disk. The programs are in source code, of course in assembler, so there is an opportunity to check my words, and at the same time, see some of the tricks of professional coding.

Using SSE registers and AVX commands of modern processors to implement GOST 28147-89

Modern processors of the x86/64 architecture include a set of SSE registers of 16 bytes in size and specialized FPUs (at least two) to perform various operations on these registers. It is possible to implement GOST on this equipment, and in this case, replacement nodes can be placed not in the form of tables in RAM, but directly on dedicated SSE registers.

One SSE register can accommodate two tables of 16 rows at once. Thus, four SSE registers will completely accommodate all replacement tables. The only condition for such placement is the interleaving requirement, according to which tetrads of the same byte must be placed in different SSE registers. In addition, it is advisable to place the low and high tetrads of the input bytes, respectively, in the low and high tetrads of the bytes of the SSE registers.

These requirements are determined by optimization for the existing set of AVX commands. Thus, each byte of the SSE register will contain two tetrads corresponding to different bytes of the input register of the substitution block, while the position of the byte in the SSE register uniquely corresponds to the index in the substitution block substitution table.

A diagram of one of the possible placements of replacement nodes on SSE registers is shown in Fig. 4.

Placing secret information of replacement nodes on SSE registers increases the security of the crypto procedure, but complete isolation of this secret information is possible if the following conditions are met:

The processor core is switched to hypervisor host mode, and the interrupt block (APIC) is forcibly disabled in it. In this case, the processor core is completely isolated from the OS and applications running on the computing installation.
SSE registers are loaded and the computing core is isolated before the OS starts; it is optimal to perform these procedures from the trusted boot module (TLM).
Programs for cryptoprocedures in accordance with GOST are located in a non-modifiable memory area of the computing installation (either BIOS or in flash memory of the MDZ).

Compliance with these requirements will ensure complete isolation and immutability of the program code of cryptoprocedures and the secret information used in them.

For efficient sampling of tetrad SSE registers, the multi-input byte switches included in the FPU blocks are used. These switches allow transfers from any source byte to any destination byte, using indices located in a special SSE index register. Moreover, transfer is performed in parallel for all 16 bytes of the SSE receiver register.

Having substitution storage nodes on SSE registers and a multi-input switch in FPU blocks, it is possible to organize the following transformation in the substitution block (Fig. 5).

In this scheme, the input register in each tetrad specifies the address for the corresponding switch, which transmits information from the drives of the replacement nodes to the output register via the data bus. This scheme can be organized in three ways:

Create an appropriate chip design, but this is fantastic for us.
Reprogramming the microcode and creating your own processor command to implement this function on existing processors is no longer a fantasy, but, unfortunately, it is unrealistic in the current conditions.
Write a program using official AVX commands. The option may not be very effective, but it can be implemented “here and now.” So that's what we'll do next.

The operation of switches is controlled by a special three-address command AVX VPSHUFB. Its first operand is the receiver of information from the switches, the second is the source to which the switches' inputs are connected. The third operand is a control register for switches, each byte of which is associated with a corresponding switch; the value in it specifies the number of the direction from which the switch reads information. For a description of this command from the official Intel documentation, see Fig. 5. In Fig. Figure 6 shows a diagram of how this command works - only half of the SSE registers are shown, for the second half everything is similar.

The switch uses only the lowest four bits to determine the direction of commutation, the last bit in each byte is used to force the corresponding receiver byte to zero, but this switch function is not yet in demand in our case.

A program for sampling tetrads through FPU switches was written, but I didn’t even put it in the application - it was too pathetic. Having a 128-bit register and only using 32 bits in it is unprofessional.

As they say, “Our finish line is the horizon,” so squeeze it out, squeeze it out... we’ll press it and put it in bags!

This is not a play on words, but a harsh FPU reality - SSE registers can be divided into equal parts and the same transformations can be performed on these parts with one command. In order for the processor to understand this, there is a magic letter “P” - a packet that is placed before the command mnemonics, and no less magical letters “Q”, “D”, “W”, “B”, which are placed at the end and declare What parts are the SSE registers divided into in this command?

We are interested in batch mode with the SSE register divided into four 32-bit blocks; accordingly, all commands will be prefixed with “P” and at the end with the symbol “D”. This makes it possible to process four 32-bit blocks in parallel with one processor command, that is, calculate four data blocks in parallel.

The program that implements this method is available in Appendix 3, with all the explanations there.

However, press so much! Modern processors have at least two FPUs, and two independent instruction threads can be used to fully load them. If you correctly alternate commands from independent threads, you can load both FPU units completely with work and get eight parallel processed data streams at once. Such a program was written, and it can be viewed in Appendix 4, but you need to watch it carefully - you can go crazy. This is what is called “the code is not for everyone...”.

Price issue

The use of SSE registers to store replacement nodes is understandable - it provides some guarantee of isolation of secret information, but the meaning of calculating the cryptofunction itself on the FPU is not obvious. Therefore, the execution time of standard procedures was measured using the direct replacement method in accordance with GOST for four and eight threads.

For four threads, an execution speed of 472 processor cycles was obtained. Thus, for a processor with a frequency of 3.6 GHz, one thread is calculated at a speed of 59 megabytes per second, and four threads, respectively, at a speed of 236 megabytes per second.

For eight threads, an execution speed of 580 processor cycles was obtained. Thus, for a 3.6 GHz processor, one thread counts at 49 megabytes per second, and eight threads at 392 megabytes per second.

As the reader can see, the code in example #3 has a performance of 4 RTT, and the code in example #4 has a performance of 8 RTT. In these examples on SSE registers, the patterns are the same as when using RON, only the scheduler has reduced its efficiency. It currently provides a 20% increase in duration while doubling the code length.

Moreover, these results were obtained using universal AVX commands available in both Intel and AMD processors. If you optimize for an AMD processor, the result will be much better. It sounds contrary to the trend, but nevertheless it is true, and here’s why: AMD processors have an additional set of instructions, the so-called XOP extension, and in this additional set of instructions there are those that significantly simplify the implementation of the GOST algorithm.

This refers to the commands for logical burst byte shift and burst cyclic shift of double words. In the examples given in Appendices 3 and 4, sequences of universal commands are used that implement the necessary transformation: in the first case, one “extra” command, and in the other case, four extra commands at once. So there are optimization reserves, and considerable ones.

When it comes to further optimization, it’s worth remembering the presence of 256-bit registers (YMM registers), using which you can theoretically double the speed of calculations. But for now this is only a prospect; at the moment, processors slow down very much when executing 256-bit instructions (FPUs have a path width of 128 bits). Experiments have shown that on modern processors, counting 16 threads on YMM registers does not give any benefit. But this is only for now; on new processor models, the performance of 256-bit instructions will undoubtedly be increased, and then the use of 16 parallel threads will become advisable and will lead to an even greater increase in the speed of the crypto procedure.

Theoretically, you can count on a speed of 600–700 megabytes per second if the processor has two FPUs with a working path width of 256 bits each. In this case, we can talk about writing code with an efficiency of 16 RTT, and this is not fantasy, but the near future.

Mixed mode

Again the question of the number of registers arises; there are not enough of them to promote such an algorithm. But the hypertrading mode will help us. The processor core has a second set of registers available in logical processor mode. Therefore, we will execute the same code on two logical processors at once. In this mode, of course, we will not have more actuators, but due to alternation we can get a full load of all actuators.

You can’t count on an increase of 50% here; the bottleneck is the cache memory where the technological masks are stored, but you can still get an increase of 100 additional megabytes. This option is not shown in the appendices (the macros are similar to those used in the 8 RTT code), but it is available in the program files. So if anyone does not believe in the possibility of encryption at a speed of 500 megabytes per second on a single processor core, let them run test files. There are also texts with comments so that no one thinks that I am lying.

This focus is only possible on Intel processors; AMD has only two FPU units per two processor modules (analogous to the hypertrading mode). But there are four more ALUs that would be a sin not to use.

You can put the Bulldozer processor modules in a mode similar to the hypertrading mode, but run the conversion to RON on different modules in one thread, and on SSE registers in another thread and get the same 12 RTTs. I haven’t tested this option, but I think 12 RTT code will work more efficiently on AMD. Those interested can try it; test programs can be adjusted to work on “Bulldozers” quite easily.

Who needs it?

A serious question, but with a simple answer - everyone needs this. Soon we will all be hooked on the clouds, we will store both data and programs there, and there, oh, how we want to create our own, private corner. To do this, you will have to encrypt the traffic, and the speed of crypto conversion will be the main determining factor of comfortable work in the cloud. Our choice of encryption algorithm is small - either GOST or AES.

Moreover, oddly enough, the AES encryption algorithm built into processors turns out to be much slower; tests show speeds of 100–150 megabytes per second, and this is with a hardware implementation of the algorithm! The problem lies in the single-threaded count and the replacement block, which operates on bytes (a table of 256 rows). So GOST turns out to be more effective when implemented on x86/64 architecture, who would have thought...

This is if we talk about the achieved level of encryption speed. And if we keep in mind theoretical refinements in the field of increasing code efficiency, then most likely no one needs this. There are practically no specialists at the 3–6 RTT level, compilers generally generate code at the 1–2.5 RTT level, and the majority of programmers do not know assembler, and even if they know its spelling, they do not understand the design of a modern processor. And without this knowledge, it doesn’t matter whether it’s an assembler or some kind of SI-sharp.

But not everything is so sad: the bottom line after a week of sleepless nights is a new algorithm for implementing GOST, which would be a sin not to patent. And applications for patents (as many as three) have already been completed and filed, so, gentlemen, businessmen, line up - women and children get a discount.

Algorithm GOST 28147-89

Rice. 4.

Block cipher algorithm. When using the gamma encryption method, it can perform the functions of a stream cipher algorithm. GOST 28147-89 -- block cipher with a 256-bit key and 32 conversion cycles, operating on 64-bit blocks. The basis of the cipher algorithm is the Feistel network. There are four operating modes according to GOST 28147-89: simple replacement, gamming, gamming with feedback, and simulation insert generation mode.

The advantages of the algorithm: the futility of a force attack, the efficiency of implementation and, accordingly, high performance on modern computers, the presence of protection against the imposition of false data (generation of imitative insertion) and the same encryption cycle in all four GOST algorithms, a larger key compared to the DESX algorithm.

Disadvantages of the algorithm: The main problems of GOST are related to the incompleteness of the standard in terms of generating keys and replacement tables. It is believed that GOST has “weak” keys and replacement tables, but the standard does not describe the criteria for selecting and eliminating “weak” ones. Also, the standard does not specify an algorithm for generating a substitution table (S-boxes). On the one hand, this may be additional secret information (in addition to the key), and on the other hand, it raises a number of problems: it is impossible to determine the cryptographic strength of the algorithm without knowing the substitution table in advance; implementations of the algorithm from different manufacturers may use different replacement tables and may be incompatible with each other; the possibility of deliberate provision of weak replacement tables by licensing authorities of the Russian Federation.

Advantages of IDEA over analogues

In software implementation on Intel486SX is twice as fast as DES IDEA, which is a significant increase in speed; IDEA's key length is 128 bits, versus 56 bits for DES, which is a good improvement against brute-force keys. The probability of using weak keys is very small and amounts to 1/2 64 . IDEA is faster than the GOST 28147-89 algorithm (in software implementation on Intel486SX). Using IDEA in parallel encryption modes on Pentium III and Pentium MMX processors allows obtaining high speeds. Compared to the AES finalists, 4-way IDEA is only slightly slower than RC6 and Rijndael on Pentium II, but faster than Twofish and MARS. On Pentium III 4-way IDEA is even faster than RC6 and Rijndael. Another advantage is that it is well studied and resistant to well-known cryptanalysis tools.

Algorithm GOST 28147-89 and code “Magma” (GOST R 34.12-2015)

General scheme of the algorithm. The algorithm described by GOST 28147-89 “Information processing systems. Cryptographic protection. Cryptographic Transformation Algorithm” is a domestic symmetric encryption standard (until January 1, 2016) and is required for implementation in certified cryptographic information protection tools used in government information systems and, in some cases, in commercial systems. Certification of cryptographic information security means is required to protect information that constitutes a state secret of the Russian Federation, and information whose confidentiality must be ensured in accordance with current legislation. Also in the Russian Federation, the use of the GOST 28147-89 algorithm is recommended for protecting banking information systems.

The GOST 28147-89 algorithm (Fig. 2.21) is based on the Feistel scheme and encrypts information in blocks of 64 bits, which are divided into two subblocks of 32 bits (I, and R). Subblock R, is processed by the round transformation function, after which its value is added to the value of the subblock Lj, then the subblocks are swapped. The algorithm has 16 or 32 rounds depending on the encryption mode (simulation calculation or other encryption modes).

Rice. 2.21.

In each round of the algorithm, the following transformations are performed.

1. Key application. Subblock content R i folds modulo 2 32 with the round key K. Kj is the 32-bit part of the original key used as a round key. The GOST 28147-89 ns algorithm uses a key expansion procedure; the original 256-bit encryption key is represented as a concatenation (chaining) of eight 32-bit subkeys (Fig. 2.22): K 0, K (, K t K, K A, K 5, K 6, K 7.

The encryption process uses one of these subkeys TO

From the 1st to the 24th round - in direct sequence:

From the 25th to the 32nd round - in reverse order:

Rice. 2.22. Structure of the encryption key of the GOST 28147-89 algorithm

2. Table replacement. After applying the key, the subblock R i is divided into eight parts but 4 bits, the value of each of which is individually replaced in accordance with its replacement table (S-box). A total of eight S-blocks are used - S 0, S, S 2, S 3, S 4, S 5, S 6, S 7. Each S-box of the GOST 28147-89 algorithm is a vector (one-dimensional array) with ^elements numbered from 0 to 15. The values of the S-box are 4-bit numbers, i.e. integers from 0 to 15.

An element is taken from the S-box table whose serial number coincides with the value that came to the substitution input.

Example 2.6.

Let there be an S-block of the following form:

Let the value 0100 2 = 4 be applied to the input of this S-block. The output of the S-block will be the 4th element of the substitution table, i.e. 15 = 1111 2 (numbering of elements starts from zero).

Substitution persons are not defined by the standard, as is done, for example, in the DES cipher. Interchangeable values of replacement tables significantly complicate cryptanalysis of the algorithm. At the same time, the robustness of the algorithm significantly depends on their correct choice.

Unfortunately, the GOST 28147-89 algorithm has “weak” replacement tables, using which the algorithm can be quite easily cracked using cryptanalytic methods. Among the “weak” ones is, for example, a trivial substitution table in which the input is equal to the output (Table 2.16).

Table 2.16

Example of a weak S-box

It is believed that the specific values of the substitution tables should be kept secret and are a long-term key element, i.e. are valid for a much longer period than individual keys. However, the secret values of the replacement tables are not part of the key and cannot increase its effective length.

Indeed, secret replacement tables can be calculated using the following attack, which can be used in practice:

the zero key is set and a search for the “zero vector” is performed, i.e. values z = F( 0), where F- round transformation function of the algorithm. This requires about 2 32 encryption test operations;
using the zero vector, the values of the replacement tables are calculated, which takes no more than 2 11 operations.

However, even if the confidentiality of the replacement tables is violated, the strength of the cipher remains extremely high and does not fall below the permissible limit.

It is also assumed that the replacement tables are common to all encryption nodes within the same cryptographic security system.

Improving the structure of S-boxes is one of the most intensively researched problems in the field of symmetric block ciphers. Essentially, it requires that any changes to the S-box inputs result in random by type of change in output data. On the one hand, the larger the S-boxes, the more resistant the algorithm is to linear and differential cryptanalysis methods. On the other hand, a large substitution table is more difficult to design.

In modern algorithms, S-boxes are usually a vector (one-dimensional array) containing 2" t- bit elements. The input of the block determines the number of the element, the value of which serves as the output of the S-block.

A number of criteria have been put forward for the design of S-boxes. The replacement table must satisfy:

strict avalanche criteria;
bit independence criterion;
the requirement of nonlinearity from the input values.

To fulfill the last requirement, it was proposed to specify a linear combination i bits ( i = 1, ..., T) substitution table values bent functions(English, bent - deviating, in this case, from linear functions). Bent functions form a special class of Boolean functions characterized by the highest class of nonlinearity and compliance with the strict avalanche criterion.

Some works for S-boxes propose checking the implementation of a guaranteed avalanche effect of order y - when one input bit changes, at least the output bits of the S-box change. The property of a guaranteed avalanche effect of order y from 2 to 5 provides fairly good diffusion characteristics of S-boxes for any encryption algorithm.

When designing sufficiently large replacement tables, the following approaches can be used:

random selection (for small S-boxes may result in weak substitution tables);
random selection followed by testing for compliance with various criteria and rejection of weak S-boxes;
manual selection (too labor-intensive for large S-blocks);
mathematical approach, for example generation using bent functions (this approach is used in the CAST algorithm).

We can propose the following procedure for designing individual S-blocks of the GOST 28147-89 algorithm:

each S-box can be described by four logical functions, each of the functions must have four logical arguments;
it is necessary that these functions be sufficiently complex. This complexity requirement cannot be expressed formally, but as a necessary condition it can be required that the corresponding logical functions, written in minimal form (i.e., with the minimum possible expression length) using basic logical operations, must not be shorter than some required value;
individual functions, even those used in different replacement tables, must differ from each other to a sufficient extent.

In 2011, a new attack “reflexive meeting in the middle” was proposed, which slightly reduces the resistance of GOST 28147-89 (from 2256 to 2225). The best cryptanalysis result of the algorithm as of 2012 allows its strength to be reduced to 2,192, requiring a relatively large ciphertext size and volume of preformed data. Despite the proposed attacks, at the current level of development of computer technology, GOST 28147-89 remains practical.

Code "Magma" (GOST R 34.12-2015). The GOST 28147-89 standard has been in force in Russia for more than 25 years. During this time, it has shown sufficient durability and good efficiency of software and hardware implementations, including on low-resource devices. Although cryptanalytic attacks have been proposed that reduce its strength ratings (the best is to 2,192), they are far from being practical. Therefore, it was decided to include the GOST 28147-89 algorithm in the newly developed symmetric encryption standard.

In 2015, the shop adopted two new national cryptographic standards: GOST R 34.12-2015 “Information technology. Cryptographic information protection. Block ciphers" and GOST R 34.13-2015 "Information technology. Cryptographic information protection. Operating modes of block ciphers”, which come into effect on January 1, 2016.

The GOST R 34.12-2015 standard contains a description of two block ciphers with a block length of 128 and 64 bits. The GOST 28147-89 cipher with fixed nonlinear substitution blocks is included in the new GOST R 34.12-2015 as a 64-bit cipher called “Magma”.

Below are the replacement blocks specified in the standard:

The set of S-boxes given in the standard provides the best characteristics that determine the resistance of the cryptographic algorithm to differential and linear cryptanalysis.

According to the technical committee for standardization “Cryptographic information protection” (TK 26), fixing nonlinear substitution blocks will make the GOST 28147-89 algorithm more unified and help eliminate the use of “weak” nonlinear substitution blocks. In addition, the fixation of all long-term cipher parameters in the standard is consistent with accepted international practice. The new standard GOST R 34.12-2015 is terminologically and conceptually related to the international standards ISO/IEC 10116 “Information technologies. Security methods. Modes of operation for “-bit block ciphers” (ISO/IEC 10116:2006 Information technology - Security techniques - Modes of operation for an n-bit block cipher) and the ISO/IEC 18033 series “Information technologies. Methods and means of ensuring security. Encryption algorithms: ISO/IEC 18033-1:2005 “Part 1. General” (ISO/IEC 18033-1:2005 Information technology - Security techniques - Encryption algorithms - Part 1: General) and ISO/IEC 18033-3: 2010 “Part 3. Block ciphers” (ISO/IEC 18033-3:2010 (Information technology - Security techniques - Encryption algorithms - Part 3: Block ciphers)).

The GOST P 34.12-2015 standard also includes a new block cipher (“Grasshopper”) with a block size of 128 bits. This cipher is expected to be resistant to all currently known attacks on block ciphers.

The operating modes of block ciphers (simple replacement, gamma, gamma with output feedback, gamma with ciphertext feedback, simple replacement with chaining and generation of imitative insertion) are included in a separate standard GOST R 34.13-2015, which corresponds to the accepted international practice. These modes apply to both the Magma cipher and the new Grasshopper cipher.

A bitwise cyclic shift is performed to the left by 11 bits. Decryption is carried out according to the same scheme, but with a different schedule for using keys: from the 1st to the 8th round of decryption - in direct order: from the 9th to 32nd round of decryption - in reverse order: Compared to the DES cipher in GOST 28147-89 has the following advantages: a significantly longer key (256 bits versus 56 for the DES cipher), an attack on which by exhaustive search of the key set seems impossible at the moment; a simple schedule for using the key, which simplifies the implementation of the algorithm and increases the speed of calculations. Design of S-blocks GOST 28147-89. It is obvious that the scheme of the GOST 28147-89 algorithm is very simple. This means that the greatest encryption load falls on the replacement tables. Table values
Panasepko S.P. Encryption algorithms: a special reference book. St. Petersburg: BHV-Petersburg, 2009.
Kara O. Reflection Attacks on Product Ciphers. URL: http://eprint.iacr.org/2007/043.pdf
Russian encryption standard: reduced strength. URL: http://cryptofaq.ru/index.php/2010-12-23-18-20-21/2010-12-23-18-22-09/90-2011-02-01-07-47- 27
Achekseev E.K., Smyshlyaev S.V. GOST 28147-89: “Don’t rush to bury him.”

Brief description of the cipher

GOST 28147-89 is a Soviet and Russian symmetric encryption standard introduced in 1990, also a CIS standard. Full name - “GOST 28147-89 Information processing systems. Cryptographic protection. Cryptographic conversion algorithm." Block cipher algorithm. When using the gamma encryption method, it can perform the functions of a stream cipher algorithm.

GOST 28147-89 is a block cipher with a 256-bit key and 32 conversion cycles, operating on 64-bit blocks. The basis of the cipher algorithm is the Feistel Network. The basic encryption mode according to GOST 28147-89 is the simple replacement mode (more complex gamma, gamma with feedback and simulated insertion modes are also defined).

How the algorithm works

The algorithm is not fundamentally different from DES. It also contains encryption cycles (32 of them) according to the Feistel scheme (Fig. 2.9.).

Rice. 2.9. Encryption rounds of the GOST 28147-89 algorithm.

To generate subkeys, the original 256-bit key is divided into eight 32-bit blocks: k 1 ...k 8 . Keys k 9 ...k 24 are a cyclic repetition of keys k 1 ...k 8 (numbered from least significant to highest bits). The keys k 25 …k 32 are the keys k 1 …k 8 in reverse order.

After completing all 32 rounds of the algorithm, blocks A 33 and B 33 are glued together (note that the most significant bit becomes A 33, and the least significant bit - B 33) - the result is the result of the algorithm.

Function f(A i ,K i) is calculated as follows: A i and K i are added modulo 2 32 , then the result is divided into eight 4-bit subsequences, each of which is input to its own substitution table node(in ascending order of bit precedence), called below S-block. The total number of GOST S-blocks is eight, i.e. the same number as subsequences. Every S-block is a permutation of numbers from 0 to 15. The first 4-bit subsequence goes to the input of the first S-box, the second - to the input of the second, etc. The outputs of all eight S-boxes are combined into a 32-bit word, then the entire word is cyclically shifted to the left (towards the most significant bits) by 11 bits. All eight S-boxes can be different. In fact, they can be additional key material, but more often they are a schema parameter common to a specific group of users. The text of the standard indicates that the supply of filling replacement units (S-blocks) is carried out in the prescribed manner, i.e. algorithm developer. The community of Russian CIPF developers has agreed on the replacement nodes used on the Internet.

Decryption is performed in the same way as encryption, but the order of the subkeys k i is inverted.

Operating modes of the GOST 28147-89 algorithm

The GOST 28147-89 algorithm has four operating modes.

1. Modeeasy replacement accepts as input data whose size is a multiple of 64 bits. The result of encryption is the input text, converted in blocks of 64 bits in the case of encryption with the “32-Z” cycle, and in the case of decryption with the “32-R” cycle.

2. Modegamming accepts data of any size as input, as well as an additional 64-bit parameter - sync message. During operation, the sync message is converted in the “32-Z” cycle, the result is divided into two parts. The first part is added modulo 2 32 with a constant value of 1010101 16. If the second part is equal to 2 32 -1, then its value does not change, otherwise it is added modulo 2 32 -1 with a constant value of 1010104 16. The value obtained by combining both transformed parts, called the cipher gamma, enters the “32-3” cycle, its result is added bit by bit modulo 2 with the 64-bit block of input data. If the latter is less than 64 bits, then the extra bits of the resulting value are discarded. The resulting value is sent to the output. If there is still incoming data, then the action is repeated: the block made up of 32-bit parts is converted in parts, and so on.

3. Modegamming with feedback also accepts data of any size and a synchronization message as input. The block of input data is added bit by bit modulo 2 with the result of the transformation in the “32-3” cycle of the synchronization message. The resulting value is sent to the output. The value of the sync message is replaced in the case of encryption by the output block, and in the case of decryption - by the input block, that is, the encrypted one. If the last block of incoming data is less than 64 bits, then the extra bits of the gamma (output of the “32-3” cycle) are discarded. If there is still incoming data, then the action is repeated: from the result of encrypting the replaced value, a cipher gamma is formed, etc.

4. Production modeimitations inserts takes as input data that is at least two full 64-bit blocks in size and returns a 64-bit block of data called a simulated insert. The temporary 64-bit value is set to 0, then, as long as there is input data, it is added bit by bit modulo 2 with the result of the “16-3” cycle, the input of which is a block of input data. After the end of the input data, the temporary value is returned as the result.

Cipher cryptanalysis

The GOST 28147-89 cipher uses a 256-bit key and the volume of the key space is 2,256. On none of the currently existing general purpose computers can a key be found in less than many hundreds of years. The Russian standard GOST 28147-89 was designed with a large margin and is many orders of magnitude stronger than the American DES standard with its actual key size of 56 bits and a key space volume of only 2 56 .

There are also attacks on full-round GOST 28147-89 without any modifications. One of the first public works to analyze the algorithm exploits weaknesses in the key expansion procedure of a number of well-known encryption algorithms. In particular, the full-round GOST 28147-89 algorithm can be broken using differential cryptanalysis on related keys, but only if weak replacement tables are used. The 24-round version of the algorithm (in which the first 8 rounds are missing) is opened in a similar way with any replacement tables, however, strong replacement tables make such an attack completely impractical.

Domestic scientists A.G. Rostovtsev and E.B. Makhovenko in 2001 proposed a fundamentally new method of cryptanalysis by forming an objective function from a known plaintext, the corresponding ciphertext and the desired key value and finding its extremum corresponding to the true value of the key. They also found a large class of weak keys of the GOST 28147-89 algorithm, which make it possible to open the algorithm using only 4 selected plaintexts and the corresponding ciphertexts with a fairly low complexity.

In 2004, a group of specialists from Korea proposed an attack that, using differential cryptanalysis on related keys, can obtain a 12-bit secret key with a probability of 91.7%. The attack requires 2 35 selected plaintexts and 2 36 encryption operations. As you can see, this attack is practically useless for actually breaking the algorithm.

The replacement table is a long-term key element, that is, it is valid for a much longer period than a single key. It is assumed that it is common to all encryption nodes within the same cryptographic protection system. The quality of the cipher depends on the quality of this table. With a “strong” substitution table, the strength of the cipher does not fall below a certain acceptable limit even if it is compromised. Conversely, using a "weak" table can reduce the strength of the cipher to an unacceptably low limit. No information on the quality of the replacement table has been published in the open press of Russia, but the existence of “weak” tables is beyond doubt - an example is a “trivial” replacement table, in which each value is replaced by itself. A number of works erroneously conclude that the secret substitution tables of the GOST 28147-89 algorithm can be part of the key and increase its effective length (which is not significant, since the algorithm has a very large 256-bit key).

Encryption according to GOST 28147 89. Non-standard use of the standard

^0 = (^32^3.....

*1 =(^64^63, . ^34^33)

Traditional implementation of GOST 28147-89

Multithreaded implementation of GOST 28147-89

Using SSE registers and AVX commands of modern processors to implement GOST 28147-89

Price issue

Mixed mode

Who needs it?

Algorithm GOST 28147-89

Advantages of IDEA over analogues

Algorithm GOST 28147-89 and code “Magma” (GOST R 34.12-2015)

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts