Deep neural network. Conversational systems and machine learning


There is a lot of talk and writing about artificial neural networks today, both in the context of big data and machine learning and outside it. In this article, we will recall the meaning of this concept, once again outline the scope of its application, and also talk about an important approach that is associated with neural networks - deep learning, we will describe its concept, as well as the advantages and disadvantages in specific use cases.

What is a neural network?

As you know, the concept of a neural network (NN) comes from biology and is a somewhat simplified model of the structure of the human brain. But let’s not delve into the wilds of natural science - the easiest way is to imagine a neuron (including an artificial one) as a kind of black box with many input holes and one output.

Mathematically, an artificial neuron transforms a vector of input signals (impacts) X into a vector of output signals Y using a function called an activation function. Within the connection (artificial neural network - ANN), three types of neurons function: input (receiving information from outside world- values ​​of the variables we are interested in), output (returning the desired variables - for example, predictions, or control signals), as well as intermediate - neurons that perform certain internal (“hidden”) functions. A classical ANN thus consists of three or more layers of neurons, and in the second and subsequent layers (“hidden” and output), each element is connected to all elements of the previous layer.

It is important to remember the concept feedback, which determines the type of ANN structure: direct signal transmission (signals go sequentially from the input layer through the hidden layer and enter the output layer) and recurrent structure, when the network contains connections going back, from more distant to nearer neurons). All these concepts make up minimum required information to move to the next level of understanding ANN - training a neural network, classifying its methods and understanding the principles of operation of each of them.

Neural network training

We should not forget why such categories are used in general - otherwise there is a risk of getting bogged down in abstract mathematics. In fact, artificial neural networks are understood as a class of methods for solving certain practical problems, among which the main ones are the problems of pattern recognition, decision making, approximation and data compression, as well as the most interesting for us problems of cluster analysis and forecasting.

Without going to the other extreme and without going into details of the operation of ANN methods in each specific case, let us remind ourselves that under any circumstances, it is the ability of a neural network to learn (with a teacher or “on its own”) that is the key point in using it to solve practical problems .

In general, training an ANN is as follows:

  1. input neurons receive variables (“stimuli”) from the external environment;
  2. in accordance with the information received, the free parameters of the neural network change (intermediate layers of neurons work);
  3. As a result of changes in the structure of the neural network, the network “reacts” to information in a different way.

This is the general algorithm for training a neural network (let’s remember Pavlov’s dog - yes, yes, the internal mechanism for the formation of a conditioned reflex is exactly that - and let’s immediately forget: after all, our context involves operating technical concepts and examples).

It is clear that a universal learning algorithm does not exist and, most likely, cannot exist; Conceptually, approaches to learning are divided into supervised learning and unsupervised learning. The first algorithm assumes that for each input (“learning”) vector there is a required value of the output (“target”) vector - thus, these two values ​​form a training pair, and the entire set of such pairs is the training set. In the case of unsupervised learning, the training set consists only of input vectors - and this situation is more plausible from the point of view of real life.

Deep learning

The concept of deep learning refers to a different classification and denotes an approach to training so-called deep structures, which include multi-level neural networks. A simple example from the field of image recognition: it is necessary to teach a machine to identify increasingly abstract features in terms of other abstract features, that is, to determine the relationship between the expression of the entire face, eyes and mouth, and, ultimately, clusters of colored pixels mathematically. Thus, in a deep neural network, each level of features has its own layer; it is clear that to train such a “colossus”, the appropriate experience of researchers and the level of hardware. Conditions developed in favor of deep neural learning only by 2006 - and eight years later we can talk about the revolution that this approach has produced in machine learning.

So, first of all, in the context of our article, it is worth noting the following: deep learning in most cases is not supervised by a person. That is, this approach involves training a neural network without a teacher. This is the main advantage of the “deep” approach: supervised machine learning, especially in the case of deep structures, requires enormous time – and labor – costs. Deep learning is an approach that models human abstract thinking (or, according to at least, represents an attempt to approach it), rather than using it.

The idea, as usual, is wonderful, but quite natural problems arise in the way of the approach - first of all, rooted in its claims to universality. In fact, if deep learning approaches have achieved significant success in the field of image recognition, then with the same processing natural language There are still many more questions than there are answers. It is obvious that in the next n years it is unlikely that it will be possible to create an “artificial Leonardo Da Vinci” or even - at least! - “artificial homo sapiens”.

However, artificial intelligence researchers are already faced with the question of ethics: the fears expressed in every self-respecting science fiction film, from “Terminator” to “Transformers”, no longer seem funny (modern sophisticated neural networks can already be considered a plausible model the work of the insect's brain!), but are clearly unnecessary for now.

The ideal technological future appears to us as an era when a person will be able to delegate most of his powers to a machine - or at least be able to allow it to facilitate a significant part of his work. intellectual work. The concept of deep learning is one step towards this dream. The road ahead is long, but it is already clear that neural networks and the ever-evolving approaches associated with them are capable of realizing the aspirations of science fiction writers over time.

The robot developed under the auspices of DARPA failed to cope with the door. Source: IEEE Spectrum/DARPA.

Apparently, artificial intelligence is becoming an integral part of the industry high technology. We constantly hear about how artificial intelligence has learned to respond to letters in an email client Gmail, learns and sorts vacation photos. Mark Zuckerberg has begun creating artificial intelligence that will help us manage our homes. The problem is that the very concept of “artificial intelligence” promotes inflated expectations. It is easier for people to imagine powerful supercomputers that help our spaceships navigate the vastness of the Universe than effective spam filters. In addition, people tend to discuss details and predict the timing of the death of doomed humanity from the clutches of a soulless artificial intelligence.

The creation of the image of perfect artificial intelligence, as if straight out of science fiction films, is largely facilitated by the activities of information technology companies, which never cease to amaze us with new models of anthropomorphic digital assistants. Unfortunately, such ideas prevent us from realizing the new capabilities of computers and the possibilities through which they can change the world around us. Based on these stereotypes, we will explain some of the terms that describe the most utilitarian applications of artificial intelligence. This article will also talk about the limitations of current technology and why we shouldn't worry about a robot uprising just yet.

So, what is the meaning behind the terms “neural network”, “machine learning” and “deep learning”?

These three phrases are on everyone's lips. Let's look at them layer by layer - to simplify perception. Neural networks are at the very base of this pyramid. They represent a special type of computer architecture that is necessary to create artificial intelligence. The next level is machine learning, which acts as software for neural networks. It allows you to structure the learning process in such a way that the machine searches for the necessary answers in gigantic data sets. The pyramid is crowned deep learning, a special type of machine learning that has gained incredible popularity over the past decade - largely due to two new capabilities: sharply cheaper computing power and the limitless expanse of information, also known as the Internet.

The origins of the concept of neural networks date back to the fifties of the last century, when the study of artificial intelligence took shape as a separate area of ​​scientific research.

In general, the structure of neural networks vaguely resembles the structure of the human brain and is a network of nodes built like neural connections. Individually, these nodes do not represent anything outstanding; they can answer only the most primitive questions, but their joint activity is capable of solving the most complex problems. Much more importantly, with the right algorithms, neural networks can be trained!

YOU JUST TELL COMPUTERS WHAT TO DO. WITH THE HELP OF MACHINE LEARNING YOU SHOW EXACTLY HOW THIS NEEDS TO BE DONE

“Let's say you want to tell a computer how to cross the road,” says Ernest Davis, a professor at New York University. - Using traditional programming, you can give him a precise set of rules that will determine his behavior: make him look around, let cars pass, cross the pedestrian crossing... and just watch the result. In the case of machine learning, you show the system 10,000 videos of pedestrians crossing the road. After that, it needs to show another 10,000 videos of car-pedestrian collisions, and then just let the system do its thing.”

Teaching a computer to correctly perceive information from videos is a primary and very non-trivial task. Over the past couple of decades, humanity has tried many ways to train computers. TO similar methods refers to “reinforcing training”, in which the computer receives a kind of “reward” if the task is completed correctly and gradually optimizes the generation process the best solution. The teaching methodology can also be based on genetic algorithms, used to solve problems by randomly selecting, combining and varying the desired parameters using mechanisms similar to natural selection in nature.

Deep learning has proven to be one of the most practical techniques in modern machine learning. This approach uses a significant number of neural network layers to analyze data at various levels of abstraction. Thus, when showing a picture to a deep learning neural network system, each layer of the network will be busy analyzing the image at different magnifications. The bottom layer will analyze pixel grids as small as 5 × 5 pixels, and give two answers - "yes" or "no" - depending on the type of object that appears on a given grid. If the bottom layer responds in the affirmative, then the higher layer of the neural network analyzes how the given mesh fits into the larger template. Is this image the beginning of a straight line or an angle? Gradually, this process becomes more complex, allowing the software to understand and process the most complex data, breaking it down into its component parts.

“The higher we move up the layers of the neural network, the more large-scale things it can determine,” explains the head of the artificial intelligence laboratory at the company Facebook, Yann LeCun. - They become more abstract. At the level of the topmost layer there are sensors that can determine the type of object being studied: a person, a dog, a glider, and so on.”

SUCCESSFUL OPERATION OF A NEURAL SYSTEM WITH DEEP LEARNING REQUIRES A LARGE AMOUNT OF DATA AND A SIGNIFICANT AMOUNT OF TIME

Now let's imagine what we want with deep learning. First, you need to program the various layers of the neural network in such a way that it learns to independently distinguish the elements of a cat: claws, paws, whiskers, etc. Each layer will be made on the previous layer, which will allow it to recognize a specific element, which is why the process got its name "deep learning" Then we need to show the neural network a large number of images of cats and other animals and name them. “This is a cat,” we will explain to the computer when showing the corresponding image. - This is also a cat. But this is not quite a cat anymore.” As the neural network views the images, certain layers and groups of nodes will begin to fire in it, which will help it identify and highlight categories of claws, paws, whiskers and other attributes of the cat. Gradually, the neural network remembers which of these layers represent highest value, and enhances necessary connections, and simply ignores weak connections. For example, the system is able to detect a significant correlation between the categories “paws” and “cats,” but since not only cats have paws, the neural network will tend to find a combination of the categories “paws” and “whiskers.”

This is a very long, consistent process of training the system, built on the principle of feedback. And here there are two options: either a person will correct the computer’s errors, inclining it to the right choice, or a neural network with a sufficient amount of classified data will be able to perform independent testing. As a result of such a test, it will become obvious to her that the most weighted indices in all layers lead to the most accurate answer. And now that we have a rough idea of ​​how many steps need to be taken so that the system can confidently call an object a “cat,” let’s think about the complexity of a system that will be able to identify any thing in the world. That's why Microsoft was excited to announce an app that can differentiate between dog breeds. At first glance, the difference between the Doberman and the Schnauzer seems obvious to us, but there is great amount subtle differences that must be identified before the computer can name the difference.

Image created by the project Deep Dream companies Google, has become peculiar business card, a collective image representing artificial intelligence research to the general public.

So this is the same thing that we used Google, Facebook and others?

For the most part, yes.

Deep learning technologies are used to solve many everyday problems. Large information technology companies have long acquired their own departments for artificial intelligence research. Google And Facebook have joined forces to popularize this research and their software. Company Google recently launched free three-month online courses on artificial intelligence. And bye scientific activity researchers remain in relative obscurity, corporations are literally churning out innovative applications based on this technology: starting with the company's web application Microsoft, capable, and ending with surreal images Deep Dream. Another reason for the popularity of deep learning technology lies in the fact that large client-oriented companies are increasingly involved in its development and periodically throw the most strange developments onto the market.

INTELLIGENCE AND COMMON SENSE ARE DIFFERENT THINGS?

Despite the fact that deep learning technologies confidently cope with speech and image recognition tasks and have significant commercial potential, they have a considerable number of limitations. They require input large quantity data and fine-tuning of equipment. The problem is that their “intelligence” is highly specialized and highly unstable. As cognitive psychologist Gary Marcus so subtly noted in his journal article New Yorker , modern methods Popular technologies are "notorious for their lack of cause-and-effect relationships (as is the case between disease and symptoms) and are likely to encounter certain difficulties when attempting to analyze abstract concepts such as 'related' or 'identical'." While these technologies do not have access to logical inferences, they have a lot to learn in order to achieve the integration of abstract knowledge: after all, it is not enough to obtain information about an object, it is important to understand its purpose and how to use it.”

In other words, deep learning technologies lack common sense.

Image of dumbbells augmented with phantom limbs, which was generated using neural networks Google. Source: Google.

For example, in a research project Google The neural network was tasked with generating an image of a dumbbell after training on similar examples. The neural network coped with this task quite well: the pictures it created showed two gray circles connected by a horizontal pipe. But in the middle of each projectile, the outlines of a muscular bodybuilder’s arm were drawn. The researchers suggested that the reason for this lies in the fact that the system was shown images of athletes who held dumbbell. Deep learning technology can remember common visual signs several tens of thousands of projectiles, but the system itself will never be able to make the cognitive leap and understand that dumbbells have no hands. The list of problems is not limited to common sense. Due to the nature of perception and the way data is learned, deep learning neural networks can be confused by random combinations of pixels. We only see noise in the image, but the computer is 95% sure that it is an image of a cheetah.

However, such restrictions can be skillfully hidden and attempts can be made to circumvent them. As an example, consider the new generation of digital assistants such as Siri. They often pretend that they understand us - they answer questions asked, set an alarm and try to make you laugh with the help of several programmed jokes and jokes.

Famous artificial intelligence scientist Hector Levesque is confident that such “frivolous behavior” once again emphasizes the perception gap between artificial intelligence and the living brain. Levesque claims that his colleagues have forgotten about the word “intelligence” in the term “artificial intelligence” and calls for remembering the famous Turing test. Hector always emphasizes that the machines during this test resort to various kinds of tricks and make every effort to fool their interlocutor. Bots readily use jokes and quotes; They are ways of portraying violent outbursts of emotion and resorting to all sorts of verbal attacks in order to confuse and distract the person conducting the survey. Indeed, the machine that, according to some publications, successfully passed the Turing test, . This “legend” was chosen by the creators of the bot in order to justify its ignorance, clumsy wording and desire for illogical conclusions.

Levesque proposes a different type of test for artificial intelligence researchers, which he believes should consist of a survey with abstract, surreal questions. These questions will be logical, but assume a lot of background knowledge, which Marcus describes. Hector suggests asking bots simple questions: “Can a crocodile run a hundred-meter steeplechase?” or “Are baseball players allowed to put little wings on their caps?” Imagine what kind of knowledge a computer needs to have to answer questions like this?

So what is “real” artificial intelligence?

This is the difficulty of using the term “artificial intelligence”: it is too vague and difficult to define. In fact, the industry has long accepted an axiom: once a machine has completed a task that previously only a human could solve - be it a game of chess or facial recognition - then that task ceases to be a sign of intelligence.

Computer scientist Larry Tesler put it this way: “Anything can be called intelligence until machines get to it”. And even when solving problems that are inaccessible to humans, machines do not try to reproduce human intelligence.

“The metaphor about the similarity of a neural network and the brain is not entirely correct,” says Yann LeCun. - It is incorrect to the same extent as the statement that an airplane looks like a bird. It doesn't flap its wings, it doesn't have feathers or muscles."

“Even if we manage to create artificial intelligence,” the scientist notes, “it will not be similar to the human mind or the consciousness of an animal. For example, it would be very difficult for us to imagine an intelligent being that does not possess [the desire for] self-preservation.”

Most artificial intelligence researchers simply ignore the idea that we will never be able to create truly living, sentient artificial intelligence. "On this moment There is no scientific approach that will allow artificial intelligence to go beyond programmed settings and become truly flexible in solving several problems, says MIT professor Andrei Barbu, who heads the Center for Brains, Minds and Machines (CBMM) research center. . “It should be understood that artificial intelligence research is now at the stage of creating systems that will solve specific, highly specialized problems.”

The professor notes that there have been previous attempts at unsupervised learning, during which the system must process unlabeled data, but such research is still in its infancy. A more famous example is the company's neural network Google, into which 10 million random thumbnails from the video service were uploaded YouTube. As a result, the neural network itself understood what cats look like, but its creators did not consider this skill to be something outstanding.

As Yann LeCun said at last year's Orange Institute hackathon, “We don't yet know how to do unsupervised learning. This is the main problem."

A striking demonstration of the power of artificial intelligence. Net Watson companies IBM wins a TV quiz game Jeopardy! However, these impressive capabilities have very limited applications.

What is deep learning? March 3rd, 2016

Nowadays they talk about fashionable deep learning technologies as if it were manna from heaven. But do the speakers understand what it really is? But this concept has no formal definition, and it combines a whole stack of technologies. In this post I want to explain as popularly as possible and essentially what is behind this term, why it is so popular and what these technologies give us.


In short, this newfangled term (deep learning) is about how to assemble a more complex and deeper abstraction (representation) from some simple abstractions. despite the fact that even the simplest abstractions must be assembled by the computer itself, and not by a person. Those. It’s no longer just about learning, but about meta-learning. Figuratively speaking, the computer itself must learn how best to learn. And, in fact, this is exactly what the term “deep” implies. Almost always, this term is applied to artificial neural networks that use more than one hidden layer, so formally “deep” also means a deeper neural network architecture.

Here in the development slide you can clearly see how deep learning differs from ordinary learning. I repeat, What's unique about deep learning is that the machine finds the features itself(the key features of something by which it is easiest to separate one class of objects from another) and structures these signs hierarchically: simpler ones are combined into more complex ones. Below we will look at this with an example.

Let's look at an example of an image recognition task: before, they stuffed a huge one into a regular neural network with one layer (1024×768 - about 800,000 numerical values) picture and watched the computer slowly die, suffocating from lack of memory and the inability to understand which pixels are important for recognition and which are not. Not to mention the effectiveness of this method. Here is the architecture of such a regular (shallow) neural network.

Then they listened to how the brain distinguishes features, and it does this in a strictly hierarchical manner, and they also decided to extract a hierarchical structure from the pictures. To do this, it was necessary to add more hidden layers (layers that are between the input and output; roughly speaking, information transformation stages) to the neural network. Although they decided to do this almost immediately after neurons were invented, then networks with only one hidden layer. Those. In principle, deep networks have been around for about as long as regular ones, we just couldn’t train them. What has changed?

In 2006, several independent researchers solved this problem at once (besides, hardware capabilities had already developed enough, quite powerful video cards appeared). These researchers are: Geoffrey Hinton (and his colleague Ruslan Salakhutidinov) with the technique of pre-training each layer of a neural network with a constrained Boltzmann machine (forgive me for these terms...), Yann LeCun with convolutional neural networks, and Yoshuay Bengio with cascaded autoencoders. The first two were immediately recruited by Google and Facebook, respectively. Here are two lectures: one - Hinton, the other - Lyakuna, in which they tell what deep learning is. No one can tell you about this better than them. Another cool one lecture Schmidhuber about the development of deep learning, also one of the pillars of this science. And Hinton also has an excellent course on neurons.

What can deep neural networks do now? They are able to recognize and describe objects; one might say they “understand” what it is. It's about recognizing meanings.

Just watch this video of real-time recognition of what the camera sees.

As I already said, deep learning technologies are a whole group of technologies and solutions. I have already listed several of them in the paragraph above, another example is this recurrent networks, which are exactly what is used in the video above to describe what the network sees. But the most popular representative of this class of technologies is still LyaKun’s convolutional neural networks. They are built by analogy with the principles of operation of the visual cortex of the cat’s brain, in which so-called simple cells were discovered that respond to straight lines at different angles, and complex cells - the reaction of which is associated with the activation of a certain set of simple cells. Although, to be honest, LaCun himself was not focused on biology, he was solving a specific problem (see his lectures), and then it all coincided.

To put it simply, convolutional networks are networks where the main structural element learning is a group (combination) of neurons (usually a square of 3x3, 10x10, etc.), and not one. And at each level of the network, dozens of such groups are trained. The network finds combinations of neurons that maximize information about the image. At the first level, the network extracts the most basic, structural simple elements pictures are, one might say, building units: boundaries, strokes, segments, contrasts. Higher up are already stable combinations of elements of the first level, and so on up the chain. I would like to emphasize once again main feature deep learning: networks themselves form these elements and decide which of them are more important and which are not. This is important because in the field of machine learning, the creation of features is key and now we are moving to the stage when the computer itself learns to create and select features. The machine itself identifies a hierarchy of informative features.

So, during the learning process (viewing hundreds of pictures), the convolutional network forms a hierarchy of features of different depth levels. At the first level, they can highlight, for example, such elements (reflecting contrast, angle, border, etc.).


At the second level, this will already be an element from the elements of the first level. On the third - from the second. We must understand that this picture- just a demonstration. Now in industrial use, such networks have from 10 to 30 layers (levels).

After such a network has trained, we can use it for classification. Having given some image as input, groups of neurons in the first layer run across the image, activating in those places where there is an element of the picture corresponding to a specific element. Those. this network parses the picture into parts - first into lines, strokes, angles of inclination, then more complex parts, and in the end it comes to the conclusion that a picture from this kind of combination of basic elements is a face.

More about convolutional networks -

Today, a graph is one of the most acceptable ways to describe models created in a machine learning system. These computational graphs are composed of neuron vertices connected by synapse edges that describe the connections between the vertices.

Unlike a scalar central or vector GPU, an IPU is new type processors designed for machine learning allows you to build such graphs. A computer designed to manipulate graphs is an ideal machine for computing graph models created through machine learning.

One of the most simple ways The way to describe the process of machine intelligence is to visualize it. The Graphcore development team has created a collection of such images that are displayed on the IPU. It was based on software Poplar, which visualizes the work of artificial intelligence. Researchers from this company also found out why deep networks require so much memory, and what solutions exist to solve the problem.

Poplar includes a graphics compiler that was built from the ground up to translate standard machine learning operations into highly optimized IPU application code. It allows you to collect these graphs together using the same principle as POPNNs are collected. The library contains a set various types vertices for generalized primitives.

Graphs are the paradigm on which all software is based. In Poplar, graphs allow you to define a computation process, where vertices perform operations and edges describe the relationship between them. For example, if you want to add two numbers together, you can define a vertex with two inputs (the numbers you would like to add), some calculations (a function to add two numbers), and an output (the result).

Typically, operations with vertices are much more complex than in the example described above. They are often determined small programs, called codelets (code names). Graphical abstraction is attractive because it makes no assumptions about the structure of the computation and breaks the computation down into components that the IPU can use to operate.

Poplar uses this simple abstraction to build very large graphs that are represented as images. Software generation of the graph means we can tailor it to the specific calculations needed to ensure the most efficient use of IPU resources.

The compiler translates standard operations used in machine learning systems into highly optimized application code for the IPU. The graph compiler creates an intermediate image of the computational graph, which is deployed on one or more IPU devices. The compiler can display this computational graph, so an application written at the neural network framework level displays an image of the computational graph that is running on the IPU.


Graph of the full AlexNet training cycle in forward and backward directions

The Poplar graphics compiler turned the AlexNet description into a computational graph of 18.7 million vertices and 115.8 million edges. Clearly visible clustering is the result of strong communication between processes in each layer of the network, with easier communication between layers.

Another example is a simple fully connected network trained on MNIST, a simple computer vision dataset, a kind of “Hello, world” in machine learning. Simple network to explore this dataset helps to understand the graphs that are driven by Poplar applications. By integrating graph libraries with frameworks such as TensorFlow, the company provides one of the simplest ways to use IPUs in machine learning applications.

After the graph has been constructed using the compiler, it needs to be executed. This is possible using the Graph Engine. The example of ResNet-50 demonstrates its operation.


ResNet-50 graph

The ResNet-50 architecture allows the creation of deep networks from repeating partitions. The processor only has to define these sections once and call them again. For example, the conv4 level cluster is executed six times, but only mapped once to the graph. The image also demonstrates the variety of shapes of convolutional layers, as each one has a graph built according to a natural form of computation.

The engine creates and manages the execution of a machine learning model using a graph generated by the compiler. Once deployed, the Graph Engine monitors and responds to the IPUs, or devices, used by applications.

The ResNet-50 image shows the entire model. At this level it is difficult to identify connections between individual vertices, so it is worth looking at enlarged images. Below are some examples of sections within neural network layers.

Why do deep networks need so much memory?

Large amounts of occupied memory are one of the most big problems deep neural networks. Researchers are trying to combat the limited bandwidth of DRAM devices, which modern systems must use to store huge numbers of weights and activations in a deep neural network.

The architectures were designed using processor chips designed for sequential processing and DRAM optimizations for high-density memory. The interface between these two devices is a bottleneck that introduces bandwidth limitations and adds significant overhead in power consumption.

Although we do not yet have a complete understanding of human brain and how it works, it is generally clear that there is no large separate memory store. The function of long-term and short-term memory in the human brain is believed to be embedded in the structure of neurons + synapses. Even simple organisms like worms, with a neural brain structure of just over 300 neurons, have some memory function.

Building memory into conventional processors is one way to circumvent the memory bottleneck problem, unlocking enormous bandwidth while consuming much less power. However, on-chip memory is expensive and is not designed for the truly large amounts of memory that are attached to the CPUs and GPUs currently used to train and deploy deep neural networks.

So it's useful to look at how memory is used today in CPUs and GPU-based deep learning systems and ask yourself why they need such memory. large devices memory storage when the human brain works fine without them?

Neural networks need memory in order to store input data, weights, and activation functions as the input propagates through the network. In learning, the activation on the input must be maintained until it can be used to compute the errors in the output gradients.

For example, a 50-layer ResNet network has about 26 million weight parameters and computes 16 million forward activations. If you use a 32-bit float to store each weight and activation, it will require about 168MB of space. By using a lower precision value to store these weights and activations, we could halve or even quadruple this storage requirement.

A major memory problem arises from the fact that GPUs rely on data represented as dense vectors. Therefore, they can use single instruction thread (SIMD) to achieve high computing density. The CPU uses similar vector units for high-performance computing.

GPUs have a synapse width of 1024 bits, so they use 32-bit floating point data, so they often split it into parallel mini-batch of 32 samples to create vectors of 1024-bit data. This approach to vector parallelism increases the number of activations by 32 times and the need for local storage with a capacity of more than 2 GB.

GPUs and other machines designed for matrix algebra are also subject to memory load from weights or neural network activations. GPUs cannot efficiently perform the small convolutions used in deep neural networks. Therefore, a transformation called "reduction" is used to convert these convolutions into matrix-matrix multiplications (GEMMs), which GPUs can handle efficiently.

Additional memory is also required to store input data, temporary values, and program instructions. Measuring memory usage when training ResNet-50 on a high-end GPU showed that it required more than 7.5 GB of local DRAM.

Some might think that lower computational precision might reduce the amount of memory required, but this is not the case. By switching data values ​​to half precision for weights and activations, you will only fill half the SIMD vector width, wasting half the available compute resources. To compensate for this, when you switch from full precision to half precision on the GPU, you will then have to double the size of the mini-batch to force enough data parallelism to use all the available computation. So going to lower precision weights and activations on the GPU still requires more than 7.5GB dynamic memory with free access.

With so much data to store, it's simply impossible to fit it all into a GPU. Each convolutional neural network layer needs to store the state of the external DRAM, load the next network layer, and then load the data into the system. As a result, the interface is already limited by bandwidth and memory latency. external memory suffers from the additional burden of constantly reloading the scales and storing and retrieving activation functions. This significantly slows down training time and significantly increases power consumption.

There are several ways to solve this problem. First, operations such as activation functions can be performed “in-place,” allowing input data to be rewritten directly to the output. Thus, existing memory can be reused. Second, the opportunity for memory reuse can be obtained by analyzing the data dependency between operations on the network and the allocation of the same memory to operations that are not currently using it.

The second approach is especially effective when the entire neural network can be analyzed at compile time to create a fixed allocated memory, since memory management overhead is reduced to almost zero. It turned out that the combination of these methods can reduce the memory use of a neural network by two to three times.
A third significant approach was recently discovered by the Baidu Deep Speech team. They applied various methods saving memory to get a 16x reduction in memory consumption of activation functions, which allowed them to train networks with 100 layers. Previously, with the same amount of memory, they could train networks with nine layers.

Combining memory and processing resources into a single device has significant potential to improve the performance and efficiency of convolutional neural networks, as well as other forms of machine learning. Trade-offs can be made between memory and compute resources to achieve a balance of features and performance in the system.

Neural networks and knowledge models in other machine learning methods can be thought of as mathematical graphs. There is a huge amount of parallelism concentrated in these graphs. A parallel processor designed to exploit parallelism in graphs does not rely on mini-batch and can significantly reduce the amount of local storage required.

Current research results have shown that all these methods can significantly improve the performance of neural networks. Modern GPUs and CPUs have very limited onboard memory, only a few megabytes in total. New processor architectures specifically designed for machine learning balance memory and on-chip compute, delivering significant performance and efficiency improvements over current technologies. central processors and graphics accelerators.







2024 gtavrl.ru.