Assembly language school: operating system development. Assembly School: Operating System Development Assembling and Compiling

Youtube

I say right away, do not close the article with the thoughts “Damn, another Popov.” He only has a licked Ubuntu, and I have everything from scratch, including the kernel and applications. So, continuation under the cut.

OS group: here.
First, I'll throw you one screenshot.

There are no more of them, and now in more detail about why I am writing it.

It was a warm April evening, Thursday. Since childhood, I dreamed of writing an OS, when I suddenly thought: “Now I know the pros and asm, why not make my dream come true?”. I googled sites on this topic and found an article from Habr: "How to start and not stop writing OS". Thanks to its author for the OSDev Wiki link below. I went there and started working. There were in one article all the data on the minimum OS. I started building cross-gcc and binutils and then rewrote everything from there. You should have seen my joy when I saw the inscription "Hello, kernel World!" I jumped right from the chair and realized - I will not give up. I wrote "console" (in quotes, I didn't have access to a keyboard), but then I decided to write a window system. As a result, it worked, but I did not have access to the keyboard. And then I decided to come up with a name based on the X Window System. Googled Y Window System - it is. As a result, he called Z Window System 0.1, which is included in OS365 pre-alpha 0.1. And yes, no one saw it except me. Then I figured out how to implement keyboard support. Screenshot of the very first version, when there was nothing, not even a window system:

It didn't even move the text cursor, as you can see. Then I wrote a couple of simple Z-based applications. And here is the 1.0.0 alpha release. There were many things, even the system menu. And the file manager and calculator just didn't work.

I was directly terrorized by a friend who cares only about prettiness (Mitrofan, sorry). He said “Washed down the VBE mode 1024 * 768 * 32, washed down, washed down! Well, they got drunk! Well, I was already tired of listening to him and still washed him down. More on the implementation below.

I did everything with my bootloader, namely GRUB "ohm. With it, you can set the graphics mode without complications by adding a few magic lines to the Multiboot header.

Set ALIGN, 1<<0 .set MEMINFO, 1<<1 .set GRAPH, 1<<2 .set FLAGS, ALIGN | MEMINFO | GRAPH .set MAGIC, 0x1BADB002 .set CHECKSUM, -(MAGIC + FLAGS) .align 4 .long MAGIC .long FLAGS .long CHECKSUM .long 0, 0, 0, 0, 0 .long 0 # 0 = set graphics mode .long 1024, 768, 32 # Width, height, depth
And then I take the framebuffer address and screen resolution from the Multiboot information structure and write pixels there. VESA did everything very confused - RGB colors must be entered in reverse order (not R G B, but B G R). I did not understand for several days - why the pixels are not displayed!? In the end, I realized that I forgot to change the values of 16 color constants from 0...15 to their RGB equivalents. As a result, I released it, at the same time washed down the gradient background. Then I made a console, 2 applications and released 1.2. Oh yes, I almost forgot - you can download the OS at

assembler

assembler(from the English assemble - to assemble) - a compiler from assembly language to machine language commands.
For each processor architecture and for each OS or OS family, there is an Assembler. There are also so-called "cross-assemblers", which allow on machines with one architecture (or in the environment of one OS) to assemble programs for another target architecture or another OS, and obtain executable code in a format suitable for execution on the target architecture or in the target environment. OS.

x86 architecture

Assemblers for DOS

The best known assemblers for the DOS operating system were Borland Turbo Assembler (TASM) and Microsoft Macro Assembler (MASM). Also at one time, the simple assembler A86 was popular.
Initially, they only supported 16-bit instructions (before the advent of the Intel 80386 processor). Later versions of TASM and MASM support both 32-bit instructions, as well as all instructions introduced in more modern processors, and architecture-specific instruction systems (such as, for example, MMX, SSE, 3DNow!, etc.) .

Microsoft Windows

With the advent of the Microsoft Windows operating system, a TASM extension called TASM32 appeared, which made it possible to create programs to run in the Windows environment. The latest known version of Tasm is 5.3, which supports MMX instructions, and is currently included in the Turbo C++ Explorer. But officially the development of the program is completely stopped.
Microsoft maintains a product called Microsoft Macro Assembler. It continues to evolve to this day, with the latest versions included in the DDKs. But the version of the program aimed at creating programs for DOS is not being developed. Additionally, Stephen Hutchesson created a MASM programming package called "MASM32".

GNU and GNU/Linux

The GNU operating system includes the gcc compiler, which includes the gas assembler (GNU Assembler) using AT&T syntax, unlike most other popular assemblers that use Intel syntax.

Portable assemblers

There is also an open assembler project, versions of which are available for various operating systems, and which allows you to obtain object files for these systems. This assembler is called NASM (Netwide Assembler).
YASM is a rewritten version of NASM licensed from scratch (with some exceptions).
FASM (Flat Assembler) is a young assembler under a BSD license modified to prohibit relicensing (including under the GNU GPL). There are versions for KolibriOS, GNU/Linux, MS-DOS and Microsoft Windows, uses Intel syntax and supports AMD64 instructions.

RISC architectures

MCS-51
AVR
There are currently 2 Atmel compilers (AVRStudio 3 and AVRStudio4). The second version is an attempt to correct the not very successful first. There is also an assembler in WinAVR.
ARM
AVR32
MSP430
PowerPC

Assembly and compilation

The process of translating an assembly language program into object code is called assembly. Unlike compilation, assembly is a more or less unambiguous and reversible process. In assembly language, each mnemonic corresponds to one machine instruction, while in high-level programming languages, a large number of different instructions can hide behind each expression. In principle, this division is rather arbitrary, so sometimes the translation of assembler programs is also called compilation.

assembly language

assembly language- a type of low-level programming language, which is a format for recording machine instructions that is convenient for human perception. Often, for brevity, it is simply called assembler, which is not true.

Assembly language commands correspond one to one to processor commands and, in fact, represent a convenient symbolic form of notation (mnemonic code) of commands and their arguments. The assembly language also provides basic software abstractions: linking parts of the program and data through labels with symbolic names (during assembly, an address is calculated for each label, after which each occurrence of the label is replaced by this address) and directives.
Assembly directives allow you to include blocks of data (described explicitly or read from a file) into the program; repeat a certain fragment a specified number of times; compile the fragment according to the condition; set the fragment execution address to be different from the memory location address[specify!]; change label values during compilation; use macro definitions with parameters, etc.
Each processor model, in principle, has its own set of instructions and the corresponding assembly language (or dialect).

Advantages and disadvantages

Advantages of assembly language

The minimum amount of redundant code, that is, the use of fewer instructions and memory accesses, allows you to increase the speed and reduce the size of the program.
Ensuring full compatibility and maximum use of the capabilities of the desired platform: the use of special instructions and technical features of this platform.
When programming in assembler, special features become available: direct access to hardware, I / O ports and special processor registers, as well as the ability to write self-modifying code (that is, metaprogramming, and without the need for a software interpreter).
The latest security technologies implemented in operating systems do not allow making self-modifying code, as they exclude the simultaneous possibility of executing instructions and writing in the same memory area (W^X technology in BSD systems, DEP in Windows).

Disadvantages of assembly language

Large amounts of code and a large number of additional small tasks, which leads to the fact that the code becomes very difficult to read and understand, and therefore it becomes more difficult to debug and refine the program, as well as the difficulty of implementing programming paradigms and any other conventions. which leads to the complexity of collaborative development.
Fewer available libraries, their low compatibility with each other.
Not portable to other platforms (other than binary compatible).

Application

Directly follows from the advantages and disadvantages.
Since it is extremely inconvenient to write large programs in assembly language, they are written in high-level languages. In assembler, they write small fragments or modules for which they are critical:
performance (drivers);
code size (boot sectors, software for microcontrollers and processors with limited resources, viruses, software protection);
special features: working directly with hardware or machine code, that is, operating system loaders, drivers, viruses, protection systems.

Linking assembly code to other languages

Since only fragments of the program are most often written in assembler, they must be linked with the rest of the parts in other languages. This is achieved in 2 main ways:
At compile time- inserting assembler fragments into the program (eng. inline assembler) with special language directives, including writing procedures in assembler language. The method is convenient for simple data transformations, but full-fledged assembler code, with data and subroutines, including subroutines with many inputs and outputs that are not supported by high-level languages, cannot be done using it.
At the build stage, or separate compilation. For linking modules to interact, it is sufficient that the linking functions support the required calling conventions and data types. Separate modules can be written in any languages, including assembly language.

Syntax

There is no generally accepted standard for the syntax of assembly languages. However, there are standards that most assembly language developers adhere to. The main such standards are Intel syntax and AT&T syntax.

Instructions

The general format for writing instructions is the same for both standards:

[label:] opcode [operands] [;comment]

where the opcode is directly the mnemonic of the instruction to the processor. Prefixes can be added to it (repetitions, addressing type changes, etc.).
The operands can be constants, register names, RAM addresses, etc. The differences between the Intel and AT&T standards relate mainly to the order in which the operands are listed and their syntax for different addressing methods.
The mnemonics used are usually the same for all processors of the same architecture or family of architectures (among the widely known mnemonics are Motorola, ARM, x86 processors and controllers). They are described in the processor specification. Possible exceptions:
If the assembler uses cross-platform AT&T syntax (original mnemonics are converted to AT&T syntax)
If initially there were two standards for writing mnemonics (the instruction system was inherited from the processor of another manufacturer).
For example, the Zilog Z80 processor inherited the Intel i8080 instruction set, expanded it and changed the mnemonics (and register designations) in its own way. For example, I changed Intel's mov to ld. Motorola Fireball processors inherited the Z80 instruction set, cutting it down a bit. However, Motorola has officially returned to Intel mnemonics. And at the moment, half of the Fireball assemblers work with Intel mnemonics, and half with Zilog mnemonics.

directives

In addition to instructions, the program may contain directives: commands that are not translated directly into machine instructions, but control the operation of the compiler. Their set and syntax vary significantly and depend not on the hardware platform, but on the compiler used (giving rise to dialects of languages within the same family of architectures). As a "gentleman's set" of directives, we can distinguish:
defining data (constants and variables)
managing the organization of the program in memory and the parameters of the output file
setting the compiler mode
all kinds of abstractions (i.e. elements of high-level languages) - from the design of procedures and functions (to simplify the implementation of the procedural programming paradigm) to conditional structures and loops (for the structured programming paradigm)
macros

Program example

An example of a Hello world program for MS-DOS for the x86 architecture in the TASM dialect:

.MODEL TINY CODE SEGMENT ASSUME CS:CODE, DS:CODE ORG 100h START: mov ah,9 mov dx,OFFSET Msg int 21h int 20h Msg DB "Hello World",13,10,"$" CODE ENDS END START

Origin and criticism of the term "assembly language"

This type of language got its name from the name of the translator (compiler) from these languages \u200b\u200b- assembler (English assembler - assembler). The name of the latter is due to the fact that on the first computers there were no higher-level languages, and the only alternative to creating programs using assembler was programming directly in codes.
Assembly language in Russian is often called "assembler" (and something related to it - "assembler"), which, according to the English translation of the word, is incorrect, but fits into the rules of the Russian language. However, the assembler (program) itself is also called simply "assembler" and not "assembler language compiler", etc.
The use of the term "assembly language" can also lead to the misconception that there is a single low-level language, or at least a standard for such languages. When naming the language in which a particular program is written, it is desirable to specify for which architecture it is intended and in which dialect of the language it is written.

Original: AsmSchool: Make an operating system
Author: Mike Saunders
Publication date: April 15, 2016
Translation: A. Panin
Date of transfer: April 16, 2016

Part 4: With the skills you've learned from previous articles in the series, you can start developing your own operating system!

What is it for?

To understand how compilers work.
To understand the instructions of the central processing unit.
To optimize your code in terms of performance.

Over the course of a few months, we have come a long way, which started with the development of simple assembly language programs for Linux and ended in the last article in the series with the development of self-contained code that runs on a personal computer without an operating system. Well, now we will try to collect all the information together and create a real operating system. Yes, we will follow in the footsteps of Linus Torvalds, but first it is worth answering the following questions: "What is an operating system? Which of its functions will we have to recreate?".

In this article, we will focus only on the main functions of the operating system: loading and executing programs. Complex operating systems perform many more functions, such as managing virtual memory and processing network packets, but they require years of continuous work to be correctly implemented, so in this article we will only consider the main functions that are present in any operating system. Last month we developed a small program that fit in a 512-byte sector of a floppy disk (its first sector), and now we will modify it a little to add the function of loading additional data from the disk.

Bootloader Development

We could try to keep our operating system binary as small as possible in order to place it in the first 512-byte sector of the floppy disk, the one that is loaded by the BIOS, but in this case we will not be able to implement any interesting functions. Therefore, we will use these 512 bytes to place the binary code of a simple system loader, which will load the binary code of the OS kernel into RAM and execute it. (After that, we will develop the OS kernel itself, which will load the binary code of other programs from disk and also execute it, but we will talk about this a little later.)

You can download the source code for the examples discussed in this article at www.linuxvoice.com/code/lv015/asmschool.zip . And this is the code for our bootloader from a file called boot.asm:

BITS 16 jmp short start ; Jump to label skipping disk description nop ; Addition before disk description %include "bpb.asm" start: mov ax, 07C0h ; Load address mov ds, ax ; Data segment mov ax, 9000h ; Prepare stack mov ss, ax mov sp, 0FFFFh ; The stack is growing down! cld ; Set direction flag mov si, kern_filename call load_file jmp 2000h:0000h ; Jump to OS kernel binary loaded from file kern_filename db "MYKERNELBIN" %include "disk.asm" times 510-($-$$) db 0 ; Zero padding of binary code up to 510 bytes dw 0AA55h ; Boot loader binary code end mark buffer: ; Start buffer for disk content

In this code, the first CPU instruction is the jmp instruction, which is located after the BITS directive, which tells the NASM assembler that 16-bit mode is being used. As you probably remember from the previous article in the series, the execution of the 512-byte binary code loaded by the BIOS from the disk starts from the very beginning, but we have to jump to the label to skip the special data set. Obviously, last month we just wrote the code to the beginning of the disk (using the dd utility), and left the rest of the disk space empty.

Now we will have to use a floppy disk with a suitable MS-DOS (FAT12) file system, and in order to work correctly with this file system, we need to add a set of special data near the beginning of the sector. This set is called a "BIOS Parameter Block" (BPB) and contains data such as the disk label, number of sectors, and so on. It should not interest us at this stage, since such topics can be devoted to more than one series of articles, which is why we placed all the instructions and data related to it in a separate source code file called bpb.asm .

Based on the above, this directive from our code is extremely important:

%include "bpb.asm"

This is a NASM directive that allows the contents of the specified source file to be included in the current source file during assembly. Thus, we will be able to make the code of our system loader as short and understandable as possible by moving all the details of the implementation of the BIOS parameter block into a separate file. The BIOS parameter block must be located three bytes after the start of the sector, and since the jmp instruction only occupies two bytes, we have to use the nop instruction (its name stands for "no operation" - this is an instruction that does nothing but waste CPU cycles ) to fill the remaining byte.

Working with the stack

Next, we will have to use instructions similar to those discussed in the previous article to prepare the registers and stack, as well as the cld instruction (stands for "clear direction"), which allows you to set the direction flag for certain instructions, such as the lodsb instruction, which, when executed, will increment the value in the SI register rather than decrement it.

After that, we put the address of the string in the SI register and call our load_file function. But think for a moment - we haven't developed this feature yet! Yes, that's true, but its implementation can be found in another source code file we include called disk.asm .

The FAT12 file system, used on floppy disks that are formatted in MS-DOS, is one of the simplest file systems in existence, but it also requires a fair amount of code to work with its contents. The load_file subroutine is about 200 lines long and will not be shown in this article, since we are considering the development of an operating system, not a driver for a specific file system, therefore, it is not very wise to waste space on log pages in this way. In general, we included the disk.asm source code file almost before the end of the current source file and we can forget about it. (If you are still interested in the structure of the FAT12 file system, you can read the excellent overview at http://tinyurl.com/fat12spec , and then look into the disk.asm source code file - the code contained in it is well commented .)

In any case, the load_file subroutine loads the binary code from the file with the name given in the SI register into segment 2000 with offset 0, after which we jump to its beginning for execution. And that's all - the kernel of the operating system is loaded and the system loader has completed its task!

You may have noticed that our code uses MYKERNELBIN instead of MYKERNEL.BIN as the operating system kernel filename, which fits in well with the 8+3 naming scheme used on floppy disks in DOS. In fact, the FAT12 filesystem uses the internal representation of filenames, and we save space by using a filename that is guaranteed not to require our load_file subroutine to implement a mechanism to look for the dot character and convert the filename to the internal representation of the filesystem.

After the line with the directive for connecting the disk.asm source code file, there are two lines designed to pad the binary code of the system loader with zeros up to 512 bytes and include the end mark of its binary code (this was discussed in the last article). Finally, at the very end of the code is the "buffer" label, which is used by the load_file subroutine. In general, the load_file subroutine needs free space in RAM to perform some intermediate actions in the process of finding a file on disk, and we have enough free space after loading the boot loader, so we place the buffer here.

To assemble the bootloader, use the following command:

nasm -f bin -o boot.bin boot.asm

Now we need to create an MS-DOS virtual floppy disk image and add our bootloader binary to its first 512 bytes using the following commands:

Mkdosfs -C floppy.img 1440 dd conv=notrunc if=boot.bin of=floppy.img

This completes the bootloader development process! We now have a bootable floppy disk image that allows us to load the operating system kernel binary from a file called mykernel.bin and execute it. Next, we are waiting for a more interesting part of the work - the development of the operating system kernel itself.

operating system kernel

We want our operating system kernel to perform many important tasks: displaying a greeting, accepting input from the user, determining whether the input is a supported command, and executing programs from disk after the user specifies their names. This is the operating system kernel code from the mykernel.asm file:

Mov ax, 2000h mov ds, ax mov es, ax loop: mov si, prompt call lib_print_string mov si, user_input call lib_input_string cmp byte , 0 je loop cmp word , "ls" je list_files mov ax, si mov cx, 32768 call lib_load_file jc load_fail call 32768 jmp loop load_fail: mov si, load_fail_msg call lib_print_string jmp loop list_files: mov si, file_list call lib_get_file_list call lib_print_string jmp loop prompt db 13, 10, "MyOS > ", 0 load_fail_msg db 13, 10, "Not found! ", 0 user_input times 256 db 0 file_list times 1024 db 0 %include "lib.asm"

Before looking at the code, pay attention to the last line with the directive to include the lib.asm source code file, which is also located in the asmschool.zip archive from our website. This is a library of useful subroutines for working with the screen, keyboard, lines and disks that you can also use - in this case we include this source code file at the very end of the main source code file of the operating system kernel in order to make the latter as compact and beautiful as possible . Refer to the "lib.asm Library Routines" section for more information on all of the available routines.

In the first three lines of the operating system kernel code, we fill the segment registers with data to point to the 2000 segment into which the binary code was loaded. This is important to ensure that instructions such as lodsb , which must read data from the current segment and not from any other, work correctly. After that, we will not perform any additional operations on the segments; our operating system will run with 64 KB of RAM!

Further in the code there is a label corresponding to the beginning of the cycle. First of all, we use one of the routines from the lib.asm library, namely lib_print_string , to print out the greeting. Bytes 13 and 10 before the greeting line are newline characters, due to which the greeting will not be displayed immediately after the output of any program, but always on a new line.

After that, we use another routine from the lib.asm library called lib_input_string , which takes the characters entered by the user using the keyboard and stores them in a buffer, the pointer to which is in the SI register. In our case, the buffer is declared towards the end of the operating system kernel code as follows:

User_input times 256 db 0

This declaration allows for a 256-character zero-filled buffer, which should be long enough to hold commands for a simple operating system like ours!

Next, we perform user input validation. If the first byte of the user_input buffer is null, then the user simply pressed the Enter key without entering any command; do not forget that all strings end with null characters. So in this case, we should just jump to the beginning of the loop and print out the greeting again. However, if the user enters any command, we will first have to check if he entered the ls command. Until now, you've only seen comparisons of single bytes in our assembly language programs, but don't forget that it's also possible to compare two-byte values or machine words. In this code, we compare the first machine word from the user_input buffer with the machine word corresponding to the ls line, and if they are identical, we move to the code block below. Within this block of code, we use another routine from the lib.asm library to get a comma-separated list of files on disk (which should be stored in the file_list buffer), print that list to the screen, and loop back to process user input.

Execution of third-party programs

If the user does not enter the ls command, we assume that he entered the program name from disk, so it makes sense to try to load it. Our lib.asm library contains an implementation of a useful subroutine lib_load_file , which parses the tables of the FAT12 file system of a disk: it takes a pointer to the beginning of a line with a file name using the AX register, as well as an offset value for loading a binary code from a program file using the CX register. We're already using the SI register to store a pointer to the user input string, so we copy that pointer to the AX register and then put the value 32768, which is used as an offset to load the binary code from the program file, into the CX register.

But why do we use this value as an offset for loading binary code from a program file? Well, it's just one of the memory map options for our operating system. Because we're working in a single 64KB segment and our kernel binary is loaded at offset 0, we have to use the first 32KB of memory for kernel data and the remaining 32KB for loadable program data. Thus, offset 32768 is the middle of our segment and allows us to provide a sufficient amount of RAM to both the operating system kernel and loaded programs.

After that, the lib_load_file routine performs a very important operation: if it cannot find a file with the given name on disk, or for some reason cannot read it from disk, it simply exits and sets a special carry flag. This is a CPU state flag that is set during some mathematical operations and should not be of interest to us at the moment, but we can determine the presence of this flag to make quick decisions. If the lib_load_asm subroutine sets the carry flag, we use the jc (jump if carry) instruction to jump to the block of code that prints the error message and returns to the start of the user input loop.

In the same case, if the transfer flag is not set, we can conclude that the lib_load_asm subroutine successfully loaded the binary code from the program file into RAM at address 32768. All we need in this case is to initiate the execution of the binary code loaded at this address , that is, start executing the program specified by the user! And after the ret instruction is used in this program (to return to the calling code), we will just have to return to the user input loop. Thus we have created an operating system: it consists of the simplest mechanisms for parsing commands and loading programs, implemented within about 40 lines of assembly code, albeit with a lot of help from subroutines from the lib.asm library.

To assemble the operating system kernel code, use the following command:

Nasm -f bin -o mykernel.bin mykernel.asm

After that, we will have to somehow add the mykernel.bin file to the floppy disk image file. If you are familiar with the trick of mounting disk images with loopback devices, you can access the contents of the floppy.img disk image using it, but there is an easier way using the GNU Mtools (www.gnu.org/software /mtools). This is a set of floppy disk programs that use MS-DOS/FAT12 file systems, available from the software package repositories of all popular Linux distributions, so you just need to use apt-get , yum , pacman or any other utility that used to install software packages on your distribution.

After installing the appropriate software package, in order to add the mykernel.bin file to the floppy.img disk image file, you will have to run the following command:

Mcopy -i floppy.img mykernel.bin::/

Notice the funny characters at the end of the command: colon, colon, and slash. Now we are almost ready to launch our operating system, but what's the point of it until there are applications for it? Let's correct this misunderstanding by developing an extremely simple application. Yes, now you will be developing an application for your own operating system - just imagine how much your authority will rise in the ranks of geeks. Save the following code in a file named test.asm:

Org 32768 mov ah, 0Eh mov al, "X" int 10h ret

This code simply uses the BIOS function to display the character "X" on the screen, after which it returns control to the code that called it - in our case, this code is the code of the operating system. The org line that starts the application source code is not a CPU instruction, but a NASM assembler directive that tells it that the binary code will be loaded into RAM at offset 32768, therefore, it is necessary to recalculate all offsets taking into account this circumstance.

This code also needs to be assembled, and the resulting binary file needs to be added to the floppy disk image file:

Nasm -f bin -o test.bin test.asm mcopy -i floppy.img test.bin::/

Now, take a deep breath, get ready to witness the unsurpassed results of your own work, and boot the floppy disk image using a PC emulator such as Qemu or VirtualBox. For example, the following command can be used for this purpose:

Qemu-system-i386 -fda floppy.img

Voila: the boot.img bootloader that we integrated into the first sector of the disk image loads the mykernel.bin operating system kernel, which displays a greeting. Type the ls command to get the names of the two files on disk (mykernel.bin and test.bin), then type the name of the last file to execute and display the X character on the screen.

It's cool, isn't it? Now you can start customizing your operating system's shell, adding implementations of new commands, and adding additional program files to disk. If you want to run this operating system on a real PC, you should refer to the "Running the bootloader on a real hardware platform" section from the previous article in the series - you will need exactly the same commands. Next month we will make our operating system more powerful by allowing downloadable programs to use system functions, thus implementing the concept of code separation to reduce code duplication. Much of the work is still ahead.

lib.asm library routines

As mentioned earlier, the lib.asm library provides a large set of useful subroutines for use within your operating system kernels and individual programs. Some of them use instructions and concepts that have not yet been covered in the articles in this series, others (such as routines for working with disks) are closely related to the structure of file systems, but if you consider yourself competent in these matters, you can familiarize yourself with with their implementations and understand the principle of work. However, it is more important to understand how to call them from your own code:

lib_print_string - Takes a pointer to a null-terminated string via the SI register and prints that string to the screen.
lib_input_string - takes a pointer to a buffer via the SI register and fills this buffer with characters entered by the user using the keyboard. After the user presses the Enter key, the string in the buffer is null-terminated and control returns to the calling program's code.
lib_move_cursor - Moves the cursor on the screen to the position with the coordinates passed through the DH (row number) and DL (column number) registers.
lib_get_cursor_pos - call this subroutine to get the current row and column numbers using the DH and DL registers, respectively.
lib_string_uppercase - Takes a pointer to the beginning of a null-terminated string using the AX register and converts the characters in the string to uppercase.
lib_string_length - Takes a pointer to the beginning of a null-terminated string using the AX register and returns its length using the AX register.
lib_string_compare - Takes pointers to the beginning of two null-terminated strings via the SI and DI registers and compares those strings. Sets the carry flag if the strings are identical (to use a jump instruction depending on the jc carry flag) or clear this flag if the strings differ (to use the jnc instruction).
lib_get_file_list - Takes a pointer to the start of a buffer via the SI register and puts a null-terminated string containing a comma-separated list of filenames from disk into that buffer.
lib_load_file - Takes a pointer to the beginning of a string containing a filename using the AX register and loads the contents of the file at the offset given by the CX register. Returns the number of bytes copied into memory (that is, the size of the file) using the BX register, or sets the carry flag if no file with the given name is found.

Today, in our Kunstkamera, a curious example is an operating system written in pure assembler. Together with drivers, a graphical shell, dozens of pre-installed programs and games, it takes less than one and a half megabytes. Meet the exceptionally fast and predominantly Russian OS Hummingbird.

The development of Hummingbird went pretty quickly until 2009. The bird learned to fly on different hardware, requiring the minimum of the first Pentium and eight megabytes of RAM. The minimum system requirements for Hummingbird are:

CPU: Pentium, AMD 5x86 or Cyrix 5x86 without MMX at 100 MHz;
RAM: 8 MB;
Graphics Card: VESA-compatible with support for VGA mode (640 × 480 × 16).

The modern Hummingbird is a regularly updated "nightly build" of the latest official version, released at the end of 2009. We tested build 0.7.7.0+ dated August 20, 2017.

WARNING

In the default settings, KolibriOS does not have access to disks that are visible through the BIOS. Think carefully and make a backup before changing this setting.

Although the changes in nightly builds are small, they have accumulated enough over the years. The updated Hummingbird can write to FAT16-32 / ext2 - ext4 partitions and supports other popular file systems (NTFS, XFS, ISO-9660) in read mode. It added support for USB and network cards, a TCP / IP stack and sound codecs were added. In general, you can already do something in it, and not just look once at an ultra-light operating system with a GUI and be impressed by the startup speed.

Like previous versions, the latest Hummingbird is written in flat assembler (FASM) and occupies one floppy disk - 1.44 MB. Thanks to this, it can be completely placed in some specialized memory. For example, craftsmen wrote KolibriOS directly into Flash BIOS. During operation, it can be entirely located in the cache of some processors. Just imagine: the entire operating system, along with programs and drivers, is cached!

INFO

When visiting the site kolibrios.org, the browser may warn of danger. The reason, apparently, is the assembler programs in the distribution. Now VirusTotal defines the site as completely safe.

"Hummingbird" is easily loaded from a floppy disk, hard drive, flash drive, Live CD or in a virtual machine. For emulation, it is enough to specify the OS type “other”, allocate one processor core and some RAM to it. It is not necessary to connect a disk, and if there is a router with DHCP, Hummingbird will instantly connect to the Internet and local network. Immediately upon downloading, you will see a notification.

One problem - the HTTPS protocol is not supported by the built-in Hummingbird browser. Therefore, it was not possible to look at the site in it, as well as open the pages of Google, Yandex, Wikipedia, Sberbank ... in fact, no usual address. Everyone switched to a secure protocol a long time ago. The only site with old-school pure HTTP that I came across was the "portal of the Government of Russia", but it did not look the best in a text browser.

The appearance settings in Hummingbird have improved over the years, but are still far from ideal. A list of supported video modes is displayed on the Hummingbird boot screen when you press the letter a key.

The list of available options is small, and the desired resolution may not be in it. If you have a graphics card with an AMD (ATI) GPU, then you can immediately add custom settings. To do this, you need to pass the -m parameter to the ATIKMS bootloader x x , for example:

/RD/1/DRIVERS/ATIKMS -m1280x800x60 -1

Here /RD/1/DRIVERS/ATIKMS is the path to the bootloader (RD - RAM Disk).

When the system is running, the selected video mode can be viewed with the vmode command and (theoretically) switched manually. If Hummingbird is running in a virtual machine, then this window will remain empty, but with a clean boot, Intel video drivers can be added from i915 to Skylake inclusive.

Surprisingly, a bunch of games fit in KolibriOS. Among them there are logical and arcade games, tags, a snake, tanks (no, not WoT) - a whole "Game Center"! Even Doom and Quake were ported to Hummingbird.

Another important thing was the FB2READ reader. It works correctly with Cyrillic and has text display settings.

I recommend storing all user files on a USB flash drive, but it must be connected via a USB 2.0 port. Our USB 3.0 flash drive (in a USB 2.0 port) with a capacity of 16 GB with the NTFS file system was determined immediately. If you need to write files, then you should connect a USB flash drive with a FAT32 partition.

The Hummingbird distribution includes three file managers, utilities for viewing images and documents, audio and video players, and other user applications. However, the focus is on assembly language development.

The built-in text editor has ASM syntax highlighting and even allows you to immediately run typed programs.

Among the development tools there is the Oberon-07/11 compiler for i386 Windows, Linux and KolibriOS, as well as low-level emulators: E80 - ZX Spectrum emulator, FCE Ultra - one of the best NES emulators, DOSBox v.0.74 and others. All of them were specially ported to the Hummingbird.

If you leave KolibriOS for a few minutes, the screensaver will start. Lines of code will run on the screen, in which you can see a reference to MenuetOS.

Continued available to members only

Option 1. Join the "site" community to read all the materials on the site

Membership in the community during the specified period will give you access to ALL Hacker materials, increase your personal cumulative discount and allow you to accumulate a professional Xakep Score rating!

Recently I decided to learn assembler, but I was not interested in wasting lines of code. I thought that as I study assembler, I will master some subject area. So my choice fell on writing a bootloader. The result of my findings is here in this blog.

I want to say right away that I love theory combined with practice, so let's start.

First I'll show you how to create a simple MBR so that we can enjoy the result as soon as possible. As we get more complex with practical examples, I will provide theoretical information.

Let's make a USB flash drive bootloader first!

Attention!!! Our first assembler program will work both for a flash drive and for other devices such as a Floppy disk or a Hard disk. Subsequently, in order for all the examples to work correctly, I will give a number of clarifications regarding the operation of the code on different devices.

We will write to Fasm, since it is considered the best compiler for writing loaders, which is MBR. The second reason for choosing Fasm is that it greatly simplifies the process of compiling files. No command line directives, etc. nonsense that can completely discourage learning assembler and achieve your goals. So, at the initial stage, we need two programs and some unnecessary small flash drive. I dug up 1Gb (quickly formatted, and it’s not a pity, if anything). After the work of our bootloader, the flash drive will stop functioning normally. my windows 7 refuses to format the flash drive. I advise you to return the flash drive to life with a utility HP USB Disk Storage Format Tool ( HPUSBFW.EXE) or other utilities for formatting flash drives.

Install them and throw the appropriate shortcuts on the desktop or wherever you like.

Preparation is over, let's move on to action

Open Fasmw.exe and write the following there. We will sketch out the bare minimum of code in order to see the result. Later we will analyze what is nakalyakano here. I will briefly comment.

FASM code: ============= boot.asm ===============

org 7C00h ; the addresses of our program are calculated according to this directive

use16 ; hexadecimal code is generated

cli ; disable interrupts to change addresses in segment registers

mov ax, 0

mov sp, 7C00h

sti ; enable interrupts (after changing addresses)

mov ax, 0003h ; set video mode to display string on screen

int 10h

mov ax, 1301h ; actual string output function 13h int 10h (more on that later)

mov bp, stroka ;address of output string

mov dx, 0000h ;line and column in which text is displayed

mov cx, 15 ;number of characters in output string

mov bx, 000eh ;00-video page number (better not to touch) 0e-character attributes (color, background)

int 10h

jmp $ ;stalk in place (loops the program at this point)

line db "Ok, MBR loaded!"

times 510 - ($ - $$) db 0 ;filling with zeros the gap between the previous byte and the last

db 0x55 ,0xAA ;the last two bytes

Compile this code (Ctrl + F9) into fasm "e and save the resulting binary file as boot.bin to some convenient place. Before writing our binary to a USB flash drive, a little theory.

When you plugged the USB flash drive into the computer, it is absolutely not obvious to the BIOS system that you want to boot from the USB flash drive, so in the BIOS settings you need to select the device from which you want to boot. So we chose to boot from USB (you will have to figure out how to do this yourself , since the BIOS interface has various variations... you can google the BIOS settings for your motherboard, there is nothing complicated, as a rule).

Now that the BIOS knows that you want to boot from the flash drive, it must make sure that the zero sector on the flash drive is bootable. To do this, the BIOS looks last two bytes of zero sector and, if they are equal to 0x55 0xAA, then only then will it be loaded into RAM. Otherwise, the BIOS will simply pass by your flash drive. Having found these two magic bytes, it loads the zero sector into RAM at the address 0000: 7C00h, and then forgets about the flash drive and transfers control to this address. Now all power over the computer belongs to your bootloader, and it, acting already from RAM, can load additional code from the USB flash drive. Now we will see how this very sector looks in the DMDE program.

1. Insert your flash drive into the computer and make sure that it does not contain the information you need.

2.Open the DMDE program. Read all further actions in the pictures:

After watching this comic, you will have the skill to download your MBR to a flash drive. And this is how the long-awaited result of our bootloader looks like:

By the way, if we talk about the minimum loader code, then it may look like this:

Org 7C00h
jmp$
db 508 dup(0)
db 0x55.0xAA

Such a loader, having received control, simply hangs up the computer, executing one meaningless command jmp $ in a cycle. I call her stagnation.

I posted a video on YouTube that might help you:

Finally, a few brief facts about the work of the bootloader:

1. Bootloader, aka bootloader, aka MBR has a size of 512 bytes. Historically,
that this condition must be met in order to support older media and devices.
2. The bootloader is always located in the zero sector of a flash drive, floppy disk, hard disk, from the point of view of the DMDE program or other hex editors that allow you to work with devices. To load a binary (our boot.bin) onto one of the listed devices, we don't need to think about their internal physical structure. The DMDE program just knows how to read sectors on these devices and displays them in LBA mode (simply numbers them from 0 to the last sector). You can read about LBA
3. The bootloader must always end with two bytes 0x55 0xAA.
4. The loader is always loaded into memory at 0000:7C00h.
5. The bootloader starts the operating system.