The path to becoming a reverse engineer? Reverse engineering for beginners. Protecting Android applications from reverse engineering Who is Voytyuk reverse engineer

Facebook

Sometimes someone wants to see what is in a certain program? Then he has to use reverse engineering. What it is? How does it work? How does this process happen? You can learn about all this from this article.

What is software reverse engineering?

This is the process of disassembling an application in order to understand how it works in order to recreate this process in the future with the necessary changes. Typically a debugger and an assembler are used for these purposes. Depending on the quality of the software used, the result and the amount of time that needs to be spent on bringing it back to normal will differ. The best way to explain reverse engineering to beginners is with an example. This will be an application written for Android. Now let's find out what and how.

Working with Android applications

First, we need to clarify some points. The applications use bytecode and LogCat. These are local analogues of the previously mentioned debugger and assembler. It is also necessary to understand the structure of the applications themselves. So, each program is a file with the extension apk. It is packed with zip. We are interested in its contents - application resources, classes.dex and AndroidManifest.xml. If you are programming on Android, then there should be no questions with the first and last. But classes.dex is the bytecode of the program, which is compiled specifically for the virtual machine. It is not possible to extract the java source code from it using the means available on the Internet. But it is possible to obtain dalvik opcodes - a special set of commands that are used for the virtual machine. As an analogy, we can say that this is an assembler of a local spill. Also classes.dex can be turned into a file with a jar extension. Already in this case, after decompiling it, you can get java code that will be more or less readable. This is the path we will take.

Decompilation

This process will be carried out using the Apk Manger program. Before you begin, you need to make sure that you have the necessary drivers for the device to work, and that the USB debugging mode is functioning. Initially, we will need to move the file that will be parsed into the apk_manager\place-apk-here-for-modding directive. After this, you should run Script.bat. If there are no problems, then the console will start, on which there will be green inscriptions. Select item number nine - “Decompile”. After the process has started, you must not close the console. Then you should open the apk file of interest using an archiver and extract classes.dex from it, which needs to be processed by the dex2jar program. For the result we need, we need to move it to an object with the .bat extension. A file will appear that will end with .jar. We don't close the window yet.

Analyzing the data

To get information about an application, you need to open its manifest. Using it we determine what acts as the main activity. This is what is of greatest importance to us now. It is also advisable to look at the very bottom of the program. If there is information about the license manager at the bottom, this will significantly complicate reverse engineering. If we switch to jd-gui and expand the tree, we will see several namespaces. Let's say there are three of them. The first contains files related to advertising. The second will contain license manager classes. The third contains the data we need. This is where we go. Here you will need to find and delete the key, and then the remaining lines that check whether the working version is licensed. All this needs to be cleaned. Then in our Apk Manager we look for the place where it is indicated to place the bytecode. Now let's make a small digression and comment on the commands with which problems could potentially arise. After that, all we have to do is compile the program.

Building the application

The same Apk Manager will help us with this. In the console window, which we did not close, select item No. 14. Next is a matter of technology. If the application is quite complex, then when launched it may partially or completely lose its functionality. Don’t be upset, this means that we are only halfway and there is still room to go. We continue to reverse engineer Android applications. Unfortunately, it is impossible to say in general terms what needs to be done in a particular case. Therefore, you will have to look for the problem area yourself. So, if the application window is blocked, then you need to look at the code and delete the part that is responsible for this dialog. jd-gui can help with this. As you can see, reverse engineering is not easy, and it requires a significant amount of knowledge. Even if everything started without problems, it will be necessary to test the functionality of the application. That is, reverse engineering is also a time-consuming activity. We continue to work until all problems are identified.

Safety

What if we need to protect Android applications from reverse engineering? In this case, there are two options: using special programs or creating a code structure that will interfere with parsing what has been written. The last option is suitable only for experienced specialists, so we will consider only the first method of protection. We use ProGuard as specialized software. This is an application that is used to shorten, obfuscate and optimize code. If we run the program through it, we will get a file with the *.apk extension that is smaller in size than it was. In this case, it will be much more difficult to disassemble. Moreover, the advantage of this program is that it was even introduced into the Android application build system with the r9 update. Therefore, any developer who has standard creation and development tools can use it.

Conclusion

This is not to say that reverse engineering can be presented as something uniformly bad or good. Of course, from the point of view of the developers who created the application, this is not a happy event at all. But on the other hand, in many cases, writing the necessary files by experienced programmers can be less time-consuming than using such tools. Although reverse engineering can be of great service for beginning developers, if there is no idea how to implement something, even rough and not entirely clear sketches can help achieve the goal.

The recipe is incredibly simple:
if you I want to understand every feature, tinker with every new program, parse its file format, try to hack every new game, write a bot for it, a cheat, etc. So it’s yours, just keep doing what you’re doing.

If not, then no books will help. This business requires passion and great patience.

Nobody needs Matan in reverse. At most, solving systems of linear equations will be required.
What is important is rather out-of-the-box thinking, the ability to brute force many options and approaches in your head. To do this you need to know technology. That is, literally, you need to know as much as possible. The more you know, the faster the problem will be solved. These are completely different areas: OS, networks, methods of encryption, compression, hashing, serialization; knowledge of databases and their query languages; knowledge of compilers in terms of how they generate code; knowledge of the implementation of the same standard library, understanding how the same code is compiled by different compilers, understanding how bytecode interpreters, virtual machines, etc. work.

This concerns general technologies. And there is also such a thing as architectural patterns. They are usually used in application applications; malware rarely uses this. That is, you need to see in the code, for example, the Event pattern, various variants of the MVC pattern, etc. For example, you will reverse the product to Qt. To understand it, you need to know... Qt, and be able to develop on it, read its source code, know what metaobjects are, how they are stored, used, called. And if, suddenly, it uses something interpreted, such as Python or Lua, then you not only need to know the languages themselves, but also the implementation of their interpreters. And there is also JIT...

You still need to decide what you want to reverse. Malware and application applications are a little different. In malware you need to know more non-standard things. Various options for anti-debugging, hiding activity, operating system bugs, antivirus behavior. Malware can be a botnet, for example. Botnets usually have a command server, which is quite difficult to calculate, it changes dynamically, and somehow does not allow itself to be detected. To do this, you need to know how the Internet works, how DNS works, and understand network protocols.

In short, for the reverser you need to learn All. There is no need to filter out certain technologies; you will need all of them without exception. Because everything that was created for computing systems is used in them, and accordingly, you you'll have to know this to reverse.

By the way, I almost forgot.

The best book on reverse in Russian.

And there is also a classic course of articles from Ricardo Narvaja: “Introduction to cracking from scratch using OllyDbg.” Google it. If you master Yurichev’s book and this course, you can calmly interview at Kaspersky. Although, believe me, there are things more interesting than Kaspersky.

We have collected several excellent books on reverse engineering that are suitable for both beginners and those who want to try something new, be it iOS or Xbox.

Reverse Engineering for Beginners

“Now that Denis Yurichev has made this book free, it is a contribution to the world of free knowledge and free education.” - Richard Stallman, founder of GNU, free software activist.

Reverse Engineering for Beginners is not only a textbook on reverse engineering, but also an excellent textbook on the basics of programming, suitable for both learning the depths of C++ and Java, and for better understanding how a computer works.

BIOS DISASSEMBLY NINJUTSU UNCOVERED

For many years, there has been a myth among computer enthusiasts and practitioners that modifying the BIOS (Basic Input Output System) is a kind of black magic and only a few are capable of it or that only the motherboard manufacturer can perform such a task. This book shows that with the right tools and a systematic approach to reverse engineering, anyone can understand and modify the BIOS to suit their needs without having the source code.

iOS App Reverse Engineering

The book is written in “layers” - theory, practice, theory and practice again. It consists of 4 parts:

— Concepts
- Tools
— Theory
- Practice

The first part covers the basic concepts of iOS, the file system hierarchy, and file types that are hidden from application developers but necessary for system researchers. The second part covers the main tools for system reverse engineering, such as Theos, Cycript, Reveal, IDA and LLDB. Next, the theory of iOS reverse engineering using Objective-C is discussed and the methodologies are explained. And the last part discusses 4 practices of reverse engineering systems, developed on the basis of theory and practice from the previous parts of the book.

Hacking the Xbox: An Introduction to Reverse Engineering

The Xbox console is a wonderful device, not only because it can play all sorts of new games. Powerful, but relatively cheap, the device has potential as a versatile multiplayer, PC and even web server. But the lack of literature providing knowledge and a practical basis for modifying the Xbox prevents it from revealing its full potential. This book was created to cover this shortcoming to some extent.

Tutorial

This post will really be interesting to those who are just beginning to be interested in this topic. For people with experience, it will probably only cause yawns. Except maybe...
Reverse engineering, in the less legal part where it does not concern debugging and optimization of one’s own product, also concerns the following task: “find out how it works for them.” In other words, restoring the original algorithm of the program, having in hand its executable file.
In order to stick to the basics and avoid some problems, we will “hack” not just anything, but... a keygen. In 90% of cases it will not be packed, encrypted or otherwise protected - including by international law...

In the beginning there was the word. Double

So, we need a keygen and a disassembler. As for the second, let's assume that it will be Ida Pro. Experimental nameless keygen found on the Internet:

Having opened the keygen file in Ida, we see a list of functions.

Having analyzed this list, we see several standard functions (WinMain, start, DialogFunc) and a bunch of auxiliary system functions. These are all standard features that make up the framework.
The disassembler does not recognize user functions that represent the implementation of program tasks, and not its wrapper of API and system calls, and simply call them sub_digits. Considering that there is only one such function here, it should attract our attention as it most likely contains the algorithm or part of it that interests us.

Let's run the keygen. It asks you to enter two 4-digit strings. Suppose eight characters are sent to the key calculation function at once. Let's analyze the code of the function sub_401100. The answer to the hypothesis is contained in the first two lines:

var_4= dword ptr -4
arg_0= dword ptr 8

The second line clearly hints at receiving the function argument at offset 8. However, the size of the argument is a double word equal to 4 bytes, not 8. This means that most likely the function processes one string of four characters in one pass, and it is called twice.
A question that may certainly arise is: why is an offset of 8 bytes reserved for receiving a function argument, but points to 4, since there is only one argument? As we remember, the stack grows downwards; When a value is added to the stack, the stack pointer is decremented by the corresponding number of bytes. Therefore, after a function's argument is added to the stack and before it starts running, something else is added to the stack. This is obviously the return address pushed onto the stack after calling the call system function.

Let's find places in the program where calls to the sub401100 function occur. It turns out there are really two of them: at the address DialogFunc+97 and DialogFunc+113. The instructions we are interested in start here:

Relatively long piece of code

loc_401196: mov esi, mov edi, ds:SendDlgItemMessageA lea ecx, push ecx ; lParam push 0Ah ; wParam push 0Dh ; Msg push 3E8h ; nIDDlgItem push esi ; hDlg call edi ; SendDlgItemMessageA lea edx, push edx ; lParam push 0Ah ; wParam push 0Dh ; Msg push 3E9h ; nIDDlgItem push esi ; hDlg call edi ; SendDlgItemMessageA pusha movsx ecx, byte ptr movsx edx, byte ptr movsx eax, byte ptr shl eax, 8 or eax, ecx movsx ecx, byte ptr shl eax, 8 or eax, edx shl eax, 8 or eax, ecx mov , eax popa mov eax, push eax call sub_401100

First, two SendDlgItemMessageA functions are called in a row. This function takes the element's handle and sends it the Msg system message. In our case, Msg is equal to 0Dh in both cases, which is the hexadecimal equivalent of the WM_GETTEXT constant. This retrieves the values of two text fields in which the user entered "two 4-character strings". The letter A in the function name indicates that the ASCII format is used - one byte per character.
The first line is written at offset lParam, the second, obviously, at offset var_1C.
So, after executing the SendDlgItemMessageA functions, the current state of the registers is stored on the stack using the pusha command, then one byte of one of the lines is written to the ecx, edx and eax registers. As a result, each of the registers takes the form: 000000##. Then:

The SHL instruction shifts the bit contents of the eax register by 1 byte, or in other words, multiplies the arithmetic contents by 100 in hexadecimal or 256 in decimal. As a result, eax takes the form 0000##00 (for example, 00001200).
An OR operation is performed between the received eax value and the ecx register in the form 000000## (let it be 00000034). As a result, eakh will look like this: 00001234.
The last, fourth byte of the line is written to the “freed” ESH.
The contents of eax are shifted by a byte again, making room in the low byte for the next OR instruction. Now eakh looks like this: 00123400.
The OR instruction is executed, this time between eax and edx, which contains, say, 00000056. Now eax is 00123456.
The two SHL steps eax,8 and OR are repeated, resulting in the new content ecx (00000078) being added to the “end” of eax. As a result, eax stores the value 12345678.

This value is then stored in a "variable" - a memory location at offset arg_4. The state of the registers (their previous values), previously stored on the stack, is popped from the stack and distributed to the registers. Then the eax register is written again to the value at offset arg_4 and this value is pushed from the register onto the stack. This is followed by a call to the function sub_401100.

What is the point of these operations? It’s very easy to find out even in practice, without theory. Let's set a breakpoint in the debugger, for example, on the push eax instruction (just before calling the subfunction) and run the program for execution. Keygen will launch and ask you to enter strings. By entering qwer and tyui and stopping at the breakpoint, we look at the value of eax: 72657771. We decode it into text: rewq. That is, the physical meaning of these operations is string inversion.

Now we know that sub_401100 transmits one of the original strings, turned backwards, in the size of a double word, entirely fitting in any of the standard registers. Perhaps you can take a look at the instructions sub_401100.

Another relatively long piece of code

sub_401100 proc near var_4= dword ptr -4 arg_0= dword ptr 8 push ebp mov ebp, esp push ecx push ebx push esi push edi pusha mov ecx, mov eax, ecx shl eax, 10h not eax add ecx, eax mov eax, ecx shr eax, 5 xor eax, ecx lea ecx, mov edx, ecx shr edx, 0Dh xor ecx, edx mov eax, ecx shl eax, 9 not eax add ecx, eax mov eax, ecx shr eax, 11h xor eax, ecx mov , eax popa mov eax, pop edi pop esi pop ebx mov esp, ebp pop ebp retn sub_401100 endp

At the very beginning there is nothing interesting here - the states of the registers are carefully stored on the stack. But the first command that is interesting to us is the one following the PUSHA instruction. It writes the function argument stored at offset arg_0 to exx. Then this value is transferred to eakh. And it is cut off by half: as we remember, in our example, 72657771 is transmitted in sub_401100; a logical left shift of 10h (16 in decimal) turns the register value into 77710000.
After this, the register value is inverted using the NOT instruction. This means that in the binary representation of the register, all zeros turn into ones, and ones turn into zeros. The register after executing this instruction contains 888EFFFF.
The ADD instruction adds (adds, pluses, etc.) the resulting value to the original argument value, which is still contained in the ecx register (now it’s clear why it was written first in ecx and then in eax?). The result is saved in ex. Let's check what the esx will look like after performing this operation: FAF47770.
This result is copied from exx to eax, after which the SHR instruction is applied to the contents of eax. This operation is the opposite of SHL - while the latter shifts bits to the left, the former shifts them to the right. Just as the logical shift left operation is equivalent to multiplying by powers of two, the logical shift right operation is equivalent to the same division. Let's see what value the result of this operation will be: 7D7A3BB.
Now let's do one more violence to the contents of eax and exx: the XOR instruction is addition modulo 2 or “exclusive OR”. The essence of this operation, roughly speaking, is that its result is equal to one (true) only if its operands are different. For example, in the case of 0 xor 1 the result will be true, or one. In the case of 0 xor 0 or 1 xor 1, the result will be false or zero. In our case, as a result of executing this instruction in relation to the eax (7D7A3BB) and exx (FAF47770) registers, the value FD23D4CB will be written to the eax register.

The following command, LEA ecx, elegantly and effortlessly multiplies eax by 9 and writes the result to ecx. This value is then copied to edx and shifted to the right by 13 bits: we get 73213 in edx and E6427B23 in ecx. Then - again we copy ecx and edx, writing E6454930 to esx. We copy this into eax, shift it left 9 bits: 8A926000, then invert it, getting 756D9FFF. We add this value to the ESH register - we have 5BB2E92F. We copy this into eax, shift it to the right by as much as 17 bits - 2DD9 - and copy it from exx. We end up with 5BB2C4F6. Then... then... what do we have there? What all?..
So, we save this value into the memory area at offset var_4, load the register states from the stack, take the final value from memory again, and finally remove the remaining register states stored at the beginning from the stack. We exit the function. Hurray!.. however, it’s too early to rejoice, so far at the output of the first function call we have a maximum of four semi-printable characters, and yet we still have a whole unprocessed string, and this one still needs to be brought to a divine form.

Let's move on to a higher level of analysis - from a disassembler to a decompiler. Let's represent the entire DialogFunc function, which contains calls to sub_401100, in the form of C-like pseudocode. As a matter of fact, the disassembler calls it “pseudocode”; in fact, it is practically C code, only ugly. Let's see:

Need more code. We need to build a ziggurat.

SendDlgItemMessageA(hDlg, 1000, 0xDu, 0xAu, (LPARAM)&lParam); SendDlgItemMessageA(hDlg, 1001, 0xDu, 0xAu, (LPARAM)&v15); v5 = sub_401100((char)lParam | ((SBYTE1(lParam) | ((SBYTE2(lParam) | (SBYTE3(lParam)<< 8)) << 8)) << 8)); v6 = 0; do { v21 = v5 % 0x24; v7 = v21; v5 /= 0x24u; if (v7 >= 10) v8 = v7 + 55; else v8 = v7 + 48; v21 = v8; ) while (v6< 4); v22 = 0; v9 = sub_401100(v15 | ((v16 | ((v17 | (v18 << 8)) << 8)) << 8)); v10 = 0; do { v19 = v9 % 0x24; v11 = v19; v9 /= 0x24u; if (v11 >= 10) v12 = v11 + 55; else v12 = v11 + 48; v19 = v12; ) while (v10< 4); v20 = 0; wsprintfA(&v13, "%s-%s-%s-%s", &lParam, &v15, v21, v19); SendDlgItemMessageA(hDlg, 1002, 0xCu, 0, (LPARAM)&v13);

This is already easier to read than the assembly listing. However, not in all cases you can rely on a decompiler: you need to be prepared to spend hours monitoring the thread of assembly logic, the states of registers and the stack in the debugger... and then give written explanations to FSB or FBI employees. In the evening I make especially funny jokes.
Like I said, it's easier to read, but still far from perfect. Let's analyze the code and give the variables more readable names. Let's give clear and logical names to key variables, and simpler names to counters and temporary ones.

The same thing, only translated from Chinese to Indian.

SendDlgItemMessageA(hDlg, 1000, 0xDu, 0xAu, (LPARAM)&first_given_string); SendDlgItemMessageA(hDlg, 1001, 0xDu, 0xAu, (LPARAM)&second_given_string); first_given_string_encoded = sub_401100((char)first_given_string | ((SBYTE1(first_given_string) | ((SBYTE2(first_given_string) | (SBYTE3(first_given_string)<< 8)) << 8)) << 8)); i = 0; do { first_result_string[i] = first_string_encoded % 0x24; temp_char = first_result_string[i]; first_string_encoded /= 0x24u; if (temp_char >= 10) next_char = temp_char + 55; else next_char = temp_char + 48; first_result_string = next_char; ) while (i< 4); some_kind_of_data = 0; second_string_encoded = sub_401100(byte1 | ((byte2 | ((byte3 | (byte4 << 8)) << 8)) << 8)); j = 0; do { second_result_string[j] = second_string_encoded % 0x24; temp_char2 = second_result_string[j]; second_string_encoded /= 0x24u; if (temp_char2 >= 10) next_char2 = temp_char2 + 55; else next_char2 = temp_char2 + 48; second_result_string = next_char2; ) while (j< 4); yet_another_some_kind_of_data = 0; wsprintfA(&buffer, "%s-%s-%s-%s", &first_given_string, &second_given_string, first_result_string, second_result_string); SendDlgItemMessageA(hDlg, 1002, 0xCu, 0, (LPARAM)&buffer);

Tutorial

In the beginning there was the word. Double

So, we need a keygen and a disassembler. As for the second, let's assume that it will be Ida Pro. Experimental nameless keygen found on the Internet:

Having opened the keygen file in Ida, we see a list of functions.

var_4= dword ptr -4
arg_0= dword ptr 8

Relatively long piece of code

The SHL instruction shifts the bit contents of the eax register by 1 byte, or in other words, multiplies the arithmetic contents by 100 in hexadecimal or 256 in decimal. As a result, eax takes the form 0000##00 (for example, 00001200).
An OR operation is performed between the received eax value and the ecx register in the form 000000## (let it be 00000034). As a result, eakh will look like this: 00001234.
The last, fourth byte of the line is written to the “freed” ESH.
The contents of eax are shifted by a byte again, making room in the low byte for the next OR instruction. Now eakh looks like this: 00123400.
The OR instruction is executed, this time between eax and edx, which contains, say, 00000056. Now eax is 00123456.
The two SHL steps eax,8 and OR are repeated, resulting in the new content ecx (00000078) being added to the “end” of eax. As a result, eax stores the value 12345678.

Another relatively long piece of code

Need more code. We need to build a ziggurat.

The same thing, only translated from Chinese to Indian.

The path to becoming a reverse engineer? Reverse engineering for beginners. Protecting Android applications from reverse engineering Who is Voytyuk reverse engineer

What is software reverse engineering?

Working with Android applications

Decompilation

Analyzing the data

Building the application

Safety

Conclusion

Reverse Engineering for Beginners

BIOS DISASSEMBLY NINJUTSU UNCOVERED

iOS App Reverse Engineering

Hacking the Xbox: An Introduction to Reverse Engineering

In the beginning there was the word. Double

In the beginning there was the word. Double

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts