Russian processors. What is known about Russian Elbrus microprocessors


There are many myths around Elbrus. You can meet them in the comments to any post or article about Elbrus. The main categories of myths can be reduced to three questions:

1. Is Elbrus domestic? Does domestic mean safe?
2. What is the performance? How “modern” is a computer based on Elbrus.
3. How much does it cost?

Every question has two opposite answers. From “bought Chinese” to “all ours”. From “my phone is faster” to “a little more and we’ll overtake Intel.”

I would like to clarify where all these myths come from. There is basically one reason: the MCST company - their closeness, taciturnity and, in the worst traditions of Russian reality, a tendency to inflate and easily lie. When preparing this article, I was faced with the fact that all the information on news and hardware resources revolves around the meager press releases of the MCST. It is very difficult to find new information “from above”. We have to dig, read between the lines and dig even deeper. MCST itself on emails and does not respond to orders. Find contacts on the website - try it!

Bravura speeches about “a five-year plan in three days,” as well as stories about “we’ll catch up and overtake,” come from the same place. Just re-read the press releases for 2013-2015. Now we should have mass production the latest computers based on Elbrus-16S. Do you see him? Me neither!

You can read about tricks with technological production processes in this article on Habré.

To get away from abstraction and debunk myths with something concrete, let’s take an automated workplace Workplace) Elbrus-401. This computer is produced in small quantities. It even seems to be available for order on the website. Formally.
Characteristics taken from the official website.

Parameter Meaning
Microprocessor Elbrus-4S (1891VM8Ya)
Number of processors 1
Working clock frequency processor, MHz 800
Peak performance, Gflops 50
RAM, GB 24 (up to 96), error correction support (ECC)
Video subsystem Integrated video card based on VLSI Silicon Motion SM718
Supports 2D acceleration, video scaling
16 MB video memory, PCI bus connection
VGA, DVI output Resolution up to 1920 x 1080
3D AMD video card Radeon 6000 series
PCI Express bus connection
Disk subsystem Hard SATA drive 2.0 1000 GB, 3.5" (up to 2 disks)
On-board CompactFlash card connector
mSATA disk on board with a capacity of 120 GB
Built-in drive DVD-RW drive. Dual layer disc support
Network interfaces Supports operation at data rates of 10/100/1000 Mbit/s
Sound Integrated sound card AC-97 (stereo)
I/O Ports USB 2.0: 4 connectors on the rear panel, 2 connectors on the front panel. 2 internal ports on motherboard
1 connector Gigabit Ethernet(10/100/1000 Mb/s)
1 DVI+VGA output (combined). It is possible to connect two monitors via an adapter (included)
1 RS-232 port external, 1 RS-232 port internal
connectors for audio connections (input/output, stereo)

Origin

So, how domestic and safe is the computer?

The processor turned out to be the most domestic. Its architecture and resulting blocks are a completely domestic development. It was calculated and emulated on FPGA Stratix V. Most likely, on Quartus software.

Now one EP2S180 chip costs about $8K. So the cost of the FPGA chips alone in the prototype exceeds $50K.
To prototype the Elbrus-4C+ processor, 21 Altera Stratix IV EP4SE820 microcircuits were required with a total volume of 100 million gates (although MCST itself gives a figure of 750 million) and a cost of about $200K. Wherein operating frequency prototype 9 MHz.

The first nuance: what Gigaflops are given? Theoretical, according to the LINPACK test? No information available.

Second. There is a little trick: if you look at the architecture, we will see that there is a DSP processor in the core. In the characteristics previous version The processor clearly stated that the total performance consists of Gigaflops of the main core plus the DSP core. For example, you can compare the Descriptions on the MCST website Monocube based on the Elbrus-2C+ processor and the processor itself.

But in real life everyday applications The DSP processor is of little use. They will be good at signal processing and encryption.

Here we again run into the problem of closure. If anyone has an automated workstation, he does not conduct tests, or does not post the results.

But back to the main issue, performance in real applications and everyday work. The only tests that I was able to find on this topic are from Cnews. The tests and their results can be viewed in their entirety by following the link.

Who is too lazy to walk, the essence is this. Taken Intel Core i7-2600 (3.4 GHz) and Elbrus-4S. I'm interested in the following.

It turns out that the only “real” tests with the 7z archive show that the workstation is seriously losing. Not as it should be according to Gigaflops, only twice, but 5.5 times for compression and almost 4 times for decompression (I calculated according to MIPS, because the memories are different). By the way, the conclusions and attempts to “pull an owl onto a globe” are confusing. It feels like the resource was given Elbrus with the condition of writing a positive review.

05/25/2017, Thu, 11:45, Moscow time, Text: Vladimir Bakhur

Rostec showed the first samples of PCs and servers based on domestic 8-core 64-bit Elbrus 8C processors. Installation batches of the first servers on the new chips are expected by the end of 2017.

The first samples of working PCs and servers

The united holding Ruselectronics (part of Rostec) presented the first samples of personal computers and servers based on Elbrus-8S microprocessors at the CIPR 2017 conference in Innopolis (Tatarstan). New domestic technology, according to the developers, has increased performance and guarantees users high level information protection. New servers are designed to process large volumes of information, including in real time.

Based on Elbrus-8S chips, it is planned to organize mass production of servers, workstations and other computer equipment for government agencies and business structures that place increased demands on information security, as well as for use in the field of high-performance computing, signal processing, telecommunications.

“This is a new generation of domestic computer technology. All stages of assembly are carried out at our production sites and at the enterprises of domestic partners. All this guarantees a high level of information security of the equipment,” said the deputy general director of Ruselectronics. Arseniy Brykin. -- We expect that the first pilot batch of personal computers based on the new processor will be ready by the end of the second quarter of 2017. We are presenting samples of new equipment today at the CIPR conference in Innopolis.”

Elbrus-8C chips in a 4-processor server system

As part of the united Ruselectronics, the development and implementation of Elbrus software and hardware platforms is carried out by the Institute of Electronic Control Machines (INEUM) named after. I. S. Brook. The development and production of the Elbrus-8S processor is carried out by the MCST company. The first samples of Elbrus-8C processors for laboratory experiments were received at the end of 2014. Mass production of processors will be carried out in compliance with the standards of the 28-nanometer technological process.

The installation batch of 2- and 4-processor servers based on Elbrus-8S, according to Ruselectronics, will be released by the end of 2017.

Technical details

Universal microprocessors "Elbrus-8S" are a completely Russian development. Each processor chip has 8 processor cores with an improved 64-bit Elbrus architecture of the third generation, L2 cache with a total capacity of 4 MB (8 x 512 KB) and L3 cache with a capacity of 16 MB.

Features of the Elbrus architecture mean the ability to perform up to 25 operations on each core in one machine cycle, which ensures high performance at a moderate clock frequency. The chips support dynamic binary translation technology, which allows the execution of applications and operating systems distributed in binary codes x86, including in multi-threaded mode.

Architecture of the Elbrus-8S processor

Elbrus-8S processors support a secure computing mode with special hardware control of the integrity of the memory structure, which allows for a high level of information security of the software systems using it.

Operating frequency of Elbrus-8S processors is 1.3 GHz, computing power is about 250 gigaflops per chip in single precision operations (FP32).

Compared to 4-core Elbrus-4C processors, the peak performance of the new Elbrus-8C processor chips, according to the developers, is 3-5 times higher, and the throughput of I/O channels is 8 times higher.

Processor "Elbrus-8S"

Elbrus-8C processors are designed to work with DDR3-1600 memory with ECC support (up to 4 memory controllers). Possible organization multiprocessor systems with support for up to 4 processors in the system; To support cache coherence, snooping filtering is implemented. For interprocessor exchange, 3 duplex channels are provided with throughput 16 GB/s each (8 GB/s each way).

Elbrus-8S processors are compatible with the peripheral interface controller (“ south bridge"KPI-2). KPI-2 chips support PCI-Express 2.0 bus (PCI-Express 16 + 4 lanes), 3 Gigabit Ethernet ports, up to 8 SATA 3.0 devices, up to 8 USB 2.0 ports, up to 7 devices per PCI bus 32/66, and also IDE interfaces, Audio HDA, RS-232, IEEE1284, SPI, I2C and GPIO.

The Elbrus-8S platform has a binary compatibility system with x86/x86-64 binary codes. It is also possible to develop application software and tests for self-diagnosis of equipment.

Basic operating system for the Elbrus platform is the Elbrus OS based on Linux kernels. The platform programming system supports languages ​​C, C++, Java, Fortran-77, Fortran-90.

Eight-core processor "Elbrus-8S", produced according to technological process 28 nm, was presented at the fourth conference “IT in the service of the military-industrial complex.” The largest specialized event bringing together developers and IT specialists of the military-industrial complex began yesterday in Innopolis (Republic of Tatarstan) and will last until May 29.

Alexander Yakunin announced the final stage of work on the creation of a domestic microprocessor using a new technological process for Russia - CEO United Instrument-Making Corporation, part of Rostec.

“A breakthrough result was achieved within the framework of the Baikal project, which we are conducting jointly with the T-Platforms company,” explained Alexander Yakunin. – The first engineering sample of the Baikal-T processor with the 28 nm process technology, which is revolutionary for Russia, has just been released.

The next Russian development will be a new generation of Elbrus processors based on the same technical process. Its creation has reached its final stage, the next engineering release is now undergoing testing.”

The development of Elbrus-8S is carried out by the Institute of Electronic Control Machines (INEUM) named after I. S. Bruk with the participation of the MCST company. Its characteristics look like this:

  • crystal area 350 sq. mm;
  • eight identical processor cores without hyperthreading;
  • 512 KB L2 cache per core;
  • third level cache – shared, 16 MB;
  • own architecture "Elbrus", developed at JSC "MCST";
  • a command system with vector accelerators and instructions to speed up mathematical calculations, encryption and signal processing. They are not separated into separate extensions, but are provided initially;
  • an optimizing binary code translation system ensures compatibility with x86 / x86-64 architectures while being licensed independent of Intel and achieving performance at 80% of native;
  • the ability to directly execute commands without binary translation in twenty OS distributions and over a thousand popular applications (the list is quickly growing);
  • built-in startup protection mechanisms malicious code: structured memory with access to objects through descriptors and contextual protection across language scopes; detection of object boundary violations (buffer overflows), use of uninitialized data, and dangerous deviations from programming standards.
  • support for four memory slots of the PC3‑12800 standard (DIMM DDR3-1600);
  • execution of 30 operations per cycle;
  • clock frequency of 1.3 GHz is the planned frequency ceiling, at which 100% load of all eight cores is possible for an unlimited time under standard conditions. To operate in unfavorable (and especially field) operating conditions, an automatic frequency reduction circuit (analogous to throttling) and (temporary) will be implemented to protect against overheating. software shutdown individual kernels operating system means;
  • 250 Gflops peak performance on single-precision floating point (FP32) at fully loaded all FPU blocks;
  • power dissipation at the level of 60 – 90 W (calculated figures);
  • the processor is soldered directly on the board, which reduces the cost of packaging chips and their rejection.

Elbrus-8S will work in tandem with a domestically developed peripheral interface controller - KPI-2.

This chip, currently produced using a 65 nm process technology, supports 20 PCI-Express 2.0 bus lines (8+8+4), three gigabit network controllers Ethernet, eight SATA v.3.0 ports and eight USB 2.0 ports. The data exchange rate with the processor at KPI-2 is 16 GB/s.

In addition to supporting basic interfaces, it contains a built-in SPMC controller, which provides power-saving functions, as well as an interrupt controller.

The hardware interacts with the operating system through its own BIOS microcode. Possible to work with Linux distributions, FreeBSD, QNX, Windows XP, but for critical applications, Elbrus OS based on the Linux 2.6.33 kernel is recommended. The MCST team has done a lot of work to create a real-time OS with its own mechanisms for handling interrupts, synchronization, memory management and support for tagged calculations. All this is aimed at unlocking the potential of the domestic processor architecture and protecting against common exploits.

Optimization of program code taking into account the Elbrus architecture is achieved through the use of specialized development tools: optimizing compilers for C and C++, Fortran and Java languages, debuggers, tools and libraries for parallelizing calculations. Among the latter, it is possible to use the message passing interface between processes (MPI) and open standard OpenMP.


Development of Elbrus processors.

Utilities and auxiliary components optimized for execution on Elbrus processors are already being created. These are utilities, services, libraries general purpose, database support, graphics subsystem (based on Xorg, GTK+ and Qt), tools for working with the network and peripheral devices.

The primary task is to carry out import substitution at key military-industrial complex facilities and strategically important Russian infrastructure facilities. "Computerra" is already about technical feasibility create a hardware-level Trojan bookmark in Intel processors architecture Ivy Bridge, which is extremely difficult to detect. This work by researchers was carried out at the University of Massachusetts and was positioned as a proof of concept - similar bookmarks can be created in other processors.

“The use of technology with foreign key components creates great threats in the areas of management and production that are critical for the country,” notes Alexander Yakunin. – First of all, from the point of view of data protection and hidden possibilities influence on the operation of equipment from the outside"

State tests of the Elbrus-8C processor are scheduled for the end of this year. If they are successful, serial production will begin in 2016. So far we are talking more about small-scale production at the level of about 50 thousand processors per year, but this is already a huge step for Russian microelectronics.

“At the end of this year or the beginning of next year, T-Platforms should complete work on the new Baikal-M processor, and in 2018 we plan to introduce Elbrus-16S on the same 28 Nm technology, with a frequency of 1.5 GHz and a performance of over 512 GFlops,” Alexander Yakunin voices his immediate plans. It is already known that the next Elbrus processor will perform 50 operations per clock cycle. Its estimated productivity will be 2.5 times higher than that of Elbrus-8S.

The article uses materials from JSC United Instrument-Making Corporation.

Editor's response

"Elbrus" is a multi-core universal high-performance microprocessor with a unique technology, developed by JSC "MCST".

The microprocessor is highly secure and avoids penetration into hardware and program level so-called programs that can disable or disrupt the operation of computer equipment.

Where is the processor used?

The main areas of application of the processor:

  • servers,
  • personal computers,
  • laptops,
  • computing systems, etc., used at facilities with increased requirements for information security.

What is the Elbrus-4S microprocessor?

Elbrus-4C is a quad-core processor operating at 800 MHz, which supports three memory channels. There is also a cache memory with a total capacity of 8 Megabytes. The processor is manufactured using 65 nanometer technology, its average power consumption is 45 watts.

“This is a universal microprocessor that is characterized by unique features of its architecture. Depending on the purpose of a particular technique, it can be used in harsh conditions. For example, some equipment can be immersed in water, some can be used at the North Pole, or used at temperatures below 40 degrees,” said AiF.ru Chief designer of the Elbrus 401-PC VK Vasily Vorobushkov.

What is unique about the microprocessor?

The architecture of microprocessors based on Elbrus allows:

What operating system does the microprocessor run on?

The basic operating system "Elbrus" is OS "Elbrus", built on Linux based. The platform programming system supports languages ​​C, C++, Java, .

Where are Elbrus microprocessors manufactured?

Microprocessors "Elbrus-2SM", "Elbrus-4S" and others are produced at the Zelenograd enterprises "Angstrem" and "Mikron". The latest Elbrus-8C microprocessor is produced in Taiwan, at the TSMC factory, since microelectronic production with 28 nanometer technology does not exist in Russia today.

Motherboard based on Elbrus 4c microprocessor. Frame youtube.com

Where can I purchase equipment based on the Elbrus processor?

According to Assistant General Director for Marketing at MCST Konstantin Trushkin, the company does not yet see the possibility of selling equipment to individuals. Orders are accepted only from legal entities(companies).

How much does equipment based on the Elbrus process cost?

AiF.ru does not have information about the price of computer technology developed by MCST. According to TJournal, the cost personal computer“Elbrus-401” from the first test batch will cost customers about 200 thousand rubles.

The MCST company says that the cost of production may decrease if production increases. “The cornerstone factor that determines the price is the serial number of the product. If the product is unique or is intended to perform some specific tasks, then the cost for it will never be low, either with us or with anyone else. If you look on the Internet at the price of industrial computing equipment produced abroad, for example by General Electric, you will be pleasantly surprised. When, for example, the cost of an industrial video card is 7 thousand euros. And this is just one video card, please note. If we are talking about some kind of mass market for computer technology, then everything depends on serial production. If it is possible to produce a series of over 10 thousand pieces, then we can achieve a very competitive price,” Vorobushkov said.

*Architecture- the basic layout of computer parts and the connection between them.

**Hardware bookmark- a device that can interfere with the work computing system. The result of its operation can be either complete disabling of the system or disruption of it normal functioning, For example unauthorized access to information, changing it or blocking it. For example, military equipment using a foreign microprocessor may at some point completely turn off after receiving a command to launch the corresponding “bookmark program”. In documents published by Snowden, the Agency national security The USA has a special unit Tailored Access Operations (TAO), which deals with various methods monitoring computers using “bookmarks”.

***Tact processor, or processor core clock - the interval between two pulses of the clock generator, which synchronizes the execution of all processor operations.

**** FORTRAN 77 and FORTRAN 90(Formula Translator or “formula translator”) is a standard programming language that has several standards, the main ones being 77 and 90. Fortran 77 was adopted in April 1978, Fortran 90 was approved in 1992.

  • Image processing ,
  • Programming
  • In this article we will show how image recognition technologies work on the Elbrus-4C and the new Elbrus-8C: we will look at several computer vision problems, talk a little about the algorithms for solving them, present benchmarking results and finally show a video.



    Elbrus-8S is a new 8-core MCST processor with VLIW architecture. We tested an engineering sample with a frequency of 1.3 GHz. Perhaps in serial production it will still increase.



    Here is a comparison of the characteristics of Elbrus-4S and Elbrus-8S.


    Elbrus-4S Elbrus-8S
    Clock frequency, MHz 800 1300
    Number of cores 4 8
    Number of operations per cycle (per core) until 23 up to 25
    L1 cache, per core 64 KB 64 KB
    L2 cache, per core 2 MB 512 KB
    L3 cache, shared - 16 MB
    Organization of RAM Up to 3 channels DDR3-1600 ECC Up to 4 channels DDR3-1600 ECC
    Technological process 65 nm 28 nm
    Number of transistors 986 million 2730 million
    SIMD instruction width 64 bits 64 bits
    Multiprocessor support up to 4 processors up to 4 processors
    Production start year 2014 2016
    operating system OS “Elbrus” 3.0-rc27 OS “Elbrus” 3.0-rc26
    lcc compiler version 1.21.18 1.21.14

    In Elbrus-8S, the clock frequency increased more than one and a half times, the number of cores doubled, and the architecture itself was improved.


    For example, Elbrus-8C can execute up to 25 instructions per clock cycle without taking into account SIMD (versus 23 for Elbrus-4C).


    Important: we did not carry out any special optimization for Elbrus-8S. The EML library was used, but the amount of optimization for Elbrus in our projects is now clearly less than for other architectures: there it has been gradually increasing over several years, but we have been working on the Elbrus platform not so long ago and not so actively. The main time-consuming functions, of course, have been optimized, but the rest haven’t gotten around to it yet.

    Russian passport recognition

    Of course, we decided to start mastering a new platform for us with the launch of our product Smart IDReader 1.6, which provides capabilities for recognizing passports, driver’s licenses, bank cards and other documents. It should be noted that standard version This application can effectively use no more than 4 threads when recognizing one document. For mobile devices this is more than enough, but when benchmarking desktop processors, this can lead to underestimates of the performance of multi-core systems.


    The version of Elbrus OS and the lcc compiler provided to us did not require any special changes in source code and we assembled our project without any difficulties. Note that in new version full support for C++11 appeared (it also appeared in latest versions lcc for Elbrus-4C), which is good news.


    To begin with, we decided to check how Russian passport recognition, which we already wrote about, works on Elbrus-8S. We tested in two modes: searching and recognizing a passport on a separate frame (anywhere mode) and on a video shot from a webcam (webcam mode). In anywhere mode, passport reversal recognition is performed on one frame, and the passport can be located in any part of the frame and be oriented in any way. In webcam mode, only the passport page with photo is recognized, and a series of frames is processed. It is assumed that the lines of the passport are horizontal and the passport moves slightly between frames. Information obtained from different frames is integrated to improve the quality of recognition.


    For testing, we took 1000 images for each mode and measured the average operating time of recognition (that is, the time without taking into account loading the image) when running in 1 thread and running with parallelization. The resulting operating time is shown in the table below.



    The results for the single-threaded mode are quite consistent with what was expected: in addition to the acceleration due to increased frequency (and the frequency ratio of 4C and 8C is 1300 / 800 = 1.625), a slight acceleration due to improved architecture is noticeable.


    When running on the maximum number of threads, the acceleration for both modes was 1.7. It would seem that the number of cores in Elbrus-8C is twice as large as in 4C. So where is the speedup due to the additional 4 cores? The fact is that our recognition algorithm actively uses only 4 threads and weakly scales further, so the performance gain is quite insignificant.


    Next, we decided to ensure that all cores of both processors were fully loaded and launched several passport recognition processes. Each recognition call was parallelized in the same way as in previous experiment, however here the passport processing time included loading an image from a file. Time measurements were carried out on the same thousand passports. The results when Elbrus is fully loaded are shown below:



    For the anywhere mode, the resulting speedup approached the expected speedup by ~3.6 times, falling short due to the fact that we took into account the time it took to load an image from a file. In the case of webcam mode, the impact of loading time is even greater and therefore the acceleration turned out to be more modest - 2.5 times.

    Car detection

    Detection of objects of a given type is one of the classic problems of technical vision. This can be the detection of faces, people, abandoned objects, or any other type of objects that have obvious distinctive features.


    For our example, we decided to take the task of detecting cars moving in the same direction. A similar detector can be used in automatic vehicle control systems, recognition systems license plates etc. Without hesitation, we shot a video for training and testing using a car recorder near our office. We used the Viola-Jones cascade classifier as a detector. Additionally, we applied exponential smoothing of the positions of the found cars for those that we observed several frames in a row. It is worth noting that detection is performed only in the ROI (region of interest) rectangle, which does not occupy the entire frame, since it makes little sense to try to detect the insides of our car, as well as cars that are not completely included in the frame.


    Thus, our algorithm consisted of the following steps:

    1. Cutting out an ROI rectangle in the center of the frame.
    2. Converting a color ROI image to gray.
    3. Preference for Viola-Jones traits.
      At this stage, the image is scaled, maps of auxiliary features (for example, directed boundaries) are constructed, and cumulative sums are calculated for all features to quickly calculate Haar wavelets.
    4. Running the Viola-Jones classifier on multiple windows.
      Here, with some steps, rectangular windows are moved through, on which the classifier is launched. If the classifier gave a positive answer, then the object was detected, i.e. the image inside the window corresponds to the car. In this case, the image area in which the object is located is refined: windows of the same size, but with a smaller step, are selected in the vicinity of the primary detection and are also fed to the classifier input. All found objects are saved for further processing. This procedure is repeated for several scales of the input image.
      This stage actually constitutes the main computational complexity of the problem and parallelization was carried out specifically for it. We used the tbb library for automatic selection effective number of threads.
    5. Processing the array of detections obtained after using the detector. Since a number of detections obtained can be very close and correspond to the same object, we combine detections that have a sufficiently large intersection area. The result is an array of rectangles that indicate the position of the detected cars.
    6. Comparison of detections on the previous and current frames. We consider that the same object was detected if the area of ​​intersection of the rectangles is more than half the area of ​​the current rectangle. We smooth the position of the object using the formulas:
      x i = x i+ (1-α) x i-1
      y i = y i+ (1-α) y i-1
      w i = w i+ (1-α) w i-1
      h i = h i+ (1-α) h i-1
      Where ( x, y)--- coordinates of the upper left corner of the rectangle, w And h are its width and height, respectively, and α is a constant coefficient selected experimentally.


    Here and below, to estimate fps (frame per second), we used the average operating time over 10 runs of the program. In this case, only the image processing time was taken into account, since now we were working with a recorded video, and the images were simply loaded from the file, and real system they can, for example, come from a camera. It turned out that detection works at a very decent speed, producing 15.5 fps on Elbrus-4C and 35.6 fps on Elbrus-8C. On Elbrus-8C, the processor load is far from full, although all cores are used at peak. Obviously, this is due to the fact that not all calculations in this problem were parallelized. For example, before using the Viola-Jones detector, we perform rather heavyweight auxiliary transformations on each frame, and this part of the system works sequentially.


    Now it's time for the demonstration. The application interface and rendering are done using standard means Qt5. No additional optimization was performed.


    Elbrus-4S



    Elbrus-8S


    Visual localization

    In this application, we decided to demonstrate visual localization based on feature points. Using Google Street View panoramas with GPS tracking, we taught our system to recognize the location of a camera without using data about its GPS coordinates or other external information. Such a system can be used for drones and robots as reserve system navigation, for clarification current location or to work in systems without GPS.


    First, we processed the database of panoramas with GPS coordinates. We took 660 images covering approximately 0.4 km^2 of Moscow streets:




    We then created a description of the images using feature points. For each image we:

    1. We found special points for 3 frame scales (the frame itself, a frame reduced by 4/3 times and a frame reduced by half) using the YAPE (Yet Another Point Detector) algorithm and calculated RFD descriptors for them.
    2. We saved its coordinates, a set of special points, and their descriptors. Since we will then compare the descriptors of the current frame's feature points with the values ​​of the descriptors from our database, it is convenient to store the descriptors in a tree using the Hamming distance as a metric. The total size of the saved data turned out to be slightly more than 15 MB.

    With this the preparations are complete, now let’s move on to what happens directly during the program’s operation:

    1. Converts a color image to gray.
    2. Performing auto contrast.
    3. Search for special points for three frame scales (also with coefficients 1, 0.75 and 0.5) using the YAPE algorithm and counting RFD descriptors for them. These algorithms are partially parallelized, but quite a large part of the calculations remains sequential. In addition, they have not yet been optimized for the Elbrus platform.
    4. For the resulting set of descriptors, a search is performed for similar descriptors among those stored in the tree, and several of the most similar frames are determined. For different descriptors, the tree search is parallelized using tbb. In this case, for the first 5 frames of the video, we select the 10 closest frames, and then take only 5 frames.
    5. The selected frames undergo additional filtering to remove “outliers”, because the trajectory vehicle usually continuous.

    Input data: a sequence of color frames of size 800x600 pixels.


    Such a system produces 3.0 fps on Elbrus-4C and 7.2 fps on Elbrus-8C.


    Let's show how it works:


    Elbrus-4S



    Elbrus-8S


    Conclusion

    For convenience, the main characteristics of Elbrus and the results obtained from our programs are collected in the table:



    The results for passport recognition were quite modest, since our application in its current form cannot effectively use more than 4 threads. The situation is similar with car detection and visual location: the algorithms have non-paralleled sections, so linear scaling cannot be expected as the number of cores increases. However, where there are no restrictions on applications loading all processor cores, we see an increase of 3.2 times, which is close to the theoretical limit of 3.6 times. On average, the difference in performance between generations of MCST processors on our set of tasks is about 2-3 times, and this is very pleasing. Just by increasing the frequency and improving the architecture, we see a gain of more than 1.7 times. MCST is quickly catching up with Intel's strategy of adding 5% per year.


    During tests under full load, we did not experience any problems with freezes or crashes, which indicates the maturity of the processor architecture. The VLIW approach, developed in Elbrus-8S, allows for real-time operation various algorithms computer vision, and the EML library contains a very solid set of mathematical functions that save time for those who do not intend to optimize the code themselves. In conclusion, we conducted another experiment, running 3 demos at once (localization, car search and face search) on one Elbrus-8C processor and getting an average processor load of about 80%. No comments here.



    We would like to say a big thank you to the company and the employees of MCST and INEUM Brook for the opportunity to try Elbrus-8S and congratulate them - the eight is a more than worthy processor and wish them success!

  • OCR technologies
  • Add tags




    

    2024 gtavrl.ru.