Deep learning neural networks. What is deep learning and why is everyone talking about it? Automated Machine Learning

Twitter

— The laboratory is young: there are only five people in our team so far, the work is an unplowed field, but we are serious. The main focus was the development and research of dialogue systems - online consultants and assistants who competently answer all user questions. Many companies still have such services, but either they work poorly, constantly producing errors, or there is a living person on the other side of the monitor who cannot be online 24/7, and besides, he has to be paid. We want to develop an algorithm that will allow us to create robots capable of full-fledged conversation. Such a robot will be able to buy you a plane ticket in a matter of minutes or advise you on any pressing issue. Currently, this level of systems does not exist.

Neural networks and artificial intelligence

The idea of neural networks was born in the middle of the 20th century in the USA along with the advent of the first computers. Neuroscientists who have studied theoretical aspects work of the brain, they believed that the organization of computer work in the image and likeness of work human brain will make it possible to create the first artificial intelligence in the near future.

Difference artificial intelligence from all algorithms of the previous generation is that the trained neural network does not act along a given path, but independently looks for ways to most effectively achieve the goal. The operation of a single computer “neuron” looks like this: for training, objects belonging to two types - A and B - and carrying some kind of numeric value. The program, based on the data in the training set, understands which ranges of this value correspond to objects A and which to B, and can subsequently distinguish them independently. In real problems, the system must distinguish between many types, each of which, in turn, can have dozens of properties. To solve them, a more complex structure of layers of neurons is needed, serious computing power And a large number of training tests. The 21st century has marked the beginning of an era in which these technologies can already be used to solve everyday problems.

Mikhail Burtsev, head of the laboratory:

— The concept of how neural networks work is quite simple: we give the machine a large amount of text, and it remembers how the words fit together. Based on this information, it can reproduce similar texts - the machine does not need to know the rules of syntax, declension and conjugation. There are already neural networks that, having learned from Pushkin’s works, are trying to write in his style. This is another feature of neural networks: they learn the “style” that they are given for learning. If you give Wikipedia as material, the program will sprinkle terms and use predominantly journalistic style. Since our laboratory is working on creating question-answering systems, we use ready-made dialogues to train the network. In one of the experiments, they used subtitles from films and let our network study a whole saga about vampires. Having analyzed this array of data, the neural network can already support the conversation.

Dialogues between laboratory staff and a neural network

Team: today and tomorrow

The laboratory cooperates with large research centers based at the National Research Nuclear University MEPhI and the Kurchatov Institute. Foreign experts in the field of machine learning and neuroinformatics also take part in its activities, for example Sergey Plis from The Mind Research Network. In addition, events are regularly held to popularize the laboratory’s activities and search for young talents. Winning a hackathon or successfully completing a course gives you a good chance of getting into the laboratory.

Valentin Malykh, laboratory employee:

“My path to the laboratory was very difficult. Just four years ago, I practically did not touch the topic of machine learning. Then I got involved in computational linguistics, and away we go... I changed jobs several times: I tried my hand at robotics, developed software related to computer vision, and that’s where I got acquainted with machine learning, and I wanted to do serious research.
During all my work, I managed to go to several hackathons organized by the laboratory - perhaps the most interesting thing that happened to me during that period. Then I came to the guys and said that I wanted to work for them. They took me.

Philosophy of DeepHack

Hackathons, despite their name, have nothing to do with hacking software ( English hack - to hack). These are team programming competitions in which participants spend several days, and sometimes weeks, struggling to solve one specific problem. The theme of the hackathon is announced in advance, and usually several hundred people participate. Such events are organized not only by institutes, but also by large companies that are looking for talented specialists. The Laboratory of Neural Networks and Deep Learning has already organized two hackathons at the Phystech Institute - participants listened to lectures on question-and-answer and dialogue systems and wrote the code.

Vladislav Belyaev, laboratory employee:

— This year and last year we organized hackathons on machine learning. There were a lot of applications, not only from Russia and the CIS, but also from Europe and the States. During the hackathon, lectures were given by scientists from Oxford and Stanford, Google DeepMind and OpenAI, and Russian colleagues, of course. Now we are preparing a course on neural networks, we will tell you everything from the very beginning to the end: from the biological concept and basic models in programming to the actual applied applications and specific implementation.

Free time

There are still few employees in the laboratory, so each person has a large amount of work of a different nature: you need to study algorithms, write code, prepare scientific publications.

Mikhail Burtsev, head of the laboratory:

“You have to work a lot—I don’t think I remember what it’s like anymore.” free time. No joke, there is practically no time to relax: over the past six months we have been able to go out to barbecue once in a group. Although in a sense, work can be relaxation. Hackathons and seminars provide an opportunity to communicate in a less formal setting with colleagues and make new acquaintances. We haven’t yet managed to start traditions of spending time together after work - we’re too young. In the summer we plan to get out into nature with the whole laboratory, rent a cottage and solve the most difficult and interesting problems together for two weeks - we will organize our own mini-hackathon. Let's see how effective this approach can be. Perhaps this will become our first good tradition.

Employment

The laboratory will expand and is already looking for new employees. The easiest way to get a place is to complete a two-month internship, for which you are selected based on an interview. A necessary condition Passing the interview is to complete part of the tasks of the Deep Learning course. During the internship, you have the opportunity to participate in paid custom projects. Funding for the laboratory has not yet been arranged, however, according to laboratory staff, this problem will be solved in the near future. “To come to us now means to get a chance to become the “founding father” of the laboratory in the most promising direction information technologies“says Mikhail Burtsev.

Images and photographs were provided by the MIPT Laboratory of Neural Networks and Deep Learning. Photographer: Evgeny Pelevin.

From the article you will learn what deep learning is. The article also contains many resources that you can use to master this area.

In the modern world, from healthcare to manufacturing, deep learning is used everywhere. Companies are turning to this technology to solve complex problems, such as speech and object recognition, Machine translate and so on.

One of the most impressive achievements this year was AlphaGo beating the world's best Go player. In addition to Go, machines have beaten people in other games: checkers, chess, reversi, and Jeopardy.

Winning a board game may not seem applicable to solving real-life problems, but that's not true at all. Go was designed to be unbeatable by artificial intelligence. To do this, he would need to learn one important thing for this game - human intuition. Now with the help of this development it is possible to solve many problems, inaccessible to the computer earlier.

Obviously, deep learning is still far from perfect, but it is already close to being commercially useful. For example, these self-driving cars. Well-known companies like Google, Tesla and Uber are already trying to introduce autonomous cars on city streets.

Ford Predicts Significant Increase in Self-Driving Share Vehicle by 2021. The US government also managed to develop a set of safety rules for them.

What is deep learning?

To answer this question, you need to understand how it interacts with machine learning, neural networks and artificial intelligence. To do this, we use the visualization method using concentric circles:

The outer circle is artificial intelligence in general (e.g. computers). A little further is machine learning, and right in the center are deep learning and artificial neural networks.

Roughly speaking, deep learning is simply a more convenient name for artificial neural networks. “Deep” in this phrase refers to the degree of complexity (depth) of the neural network, which can often be quite superficial.

The creators of the first neural network were inspired by the structure of the cerebral cortex. The network's base layer, the perceptron, is essentially the mathematical analogue of a biological neuron. And, as in the brain, perceptrons intersecting with each other can appear in a neural network.

The first layer of the neural network is called the input layer. Each node in this layer receives some information as input and transmits it to subsequent nodes in other layers. Most often, there are no connections between the nodes of one layer, and the last node of the chain outputs the result of the neural network.

The nodes in the middle are called hidden nodes because they do not have connections to the outside world like the output and input nodes. They are called only when previous layers are activated.

Deep learning is essentially a neural network training technique that uses many layers to solve complex problems (like speech recognition) using patterns. In the eighties, most neural networks were single-layer due to high cost and limited data capabilities.

If we consider machine learning as a branch or variant of artificial intelligence, then deep learning is a specialized type of such branch.

Machine learning uses computer intelligence that does not immediately provide the answer. Instead, the code will run on test data and, based on the correctness of its results, adjust its progress. For the success of this process, a variety of techniques are usually used, special software and computer science, which describes static methods and linear algebra.

Deep learning methods

Deep learning methods are divided into two main types:

Tutored training
Unsupervised learning

The first method uses specially selected data to achieve the desired result. It requires quite a lot of human intervention, because the data has to be selected manually. However, it is useful for classification and regression.

Imagine that you are the owner of a company and want to determine the impact of bonuses on the length of contracts of your subordinates. With pre-collected data, a supervised learning method would be indispensable and very effective.

The second method does not imply pre-prepared answers and work algorithms. It aims to identify hidden patterns in data. It is typically used for clustering and association tasks, such as grouping customers by behavior. “They also choose with this” on Amazon is a variant of the association task.

While the supervised learning method is often quite convenient, a more complex version is still better. Deep learning has proven itself to be a neural network that does not require human supervision.

The Importance of Deep Learning

Computers have long used technology to recognize certain features in an image. However, the results were far from successful. Computer vision has had an incredible impact on deep learning. It is these two techniques that this moment solve all recognition problems.

In particular, Facebook has succeeded in recognizing faces in photographs using deep learning. This is not a simple improvement in technology, but a turning point that changes all earlier beliefs: “A person can determine with 97.53% probability whether the same person is shown in two different photographs. The program developed by the Facebook team can do this with a 97.25% probability, regardless of the lighting or whether the person is looking directly at the camera or turned sideways towards it.”

Speech recognition has also undergone significant changes. The team at Baidu, one of the leading search engines in China, has developed a speech recognition system that has managed to outpace humans in the speed and accuracy of writing text in English. mobile devices. In English and Mandarin.

What’s especially interesting is that writing a common neural network for two completely different languages did not require much work: “Historically, people saw Chinese and English as two completely different languages, so each of them required a different approach,” says Andrew Ng, head of the Baidu research center. “Learning algorithms are now so generalized that you can Just learn."

Google uses deep learning to manage energy in the company's data centers. They were able to reduce cooling resource costs by 40%. That's about a 15% improvement in energy efficiency and millions of dollars in savings.

Deep learning microservices

Here short review services related to deep learning.

Illustration Tagger. Enhanced by Illustration2Vec, this service allows you to mark images with a rating of “protected”, “questionable”, “dangerous”, “copyright” or “general” in order to understand the content of the image in advance.

Google's Theano Add-on
Editable in Python and Numpy
Often used to solve a specific range of problems

Not general purpose. Focus on machine vision
Edited in C++
There is an interface in Python

Online courses on deep learning

Google and Udacity have teamed up to create a free course on deep learning, part of the Udacity Machine Learning Course. This program is led by experienced developers who want to develop the field of machine learning and, in particular, deep learning.

Another popular option is the machine learning course from Andrew Ng, supported by Coursera and Stanford.

Machine Learning - Stanford by Andrew Ng on Coursera (2010-2014)
Machine Learning - Caltech by Yaser Abu-Mostafa (2012-2014)
Machine Learning - Carnegie Mellon by Tom Mitchell (Spring 2011)
Neural networks for machine learning – Geoffrey Hinton on Coursera (2012)
Neural networks class– Hugo Larochelle from Université de Sherbrooke (2013)

Books on deep learning

While the resources in the previous section draw on a fairly extensive knowledge base, Grokking Deep Learning, on the contrary, is aimed at beginners. As the authors say: “If you have completed 11th grade and have a rough understanding of how to write Python, we will teach you deep learning.”

A popular alternative to this book is a book with the self-explanatory title Deep Learning Book. It's especially good because it covers all the math you'll need to get into this area.

"Deep Learning" by Yoshua Bengio, Ian Goodfellow and Aaron Courville (2015)
“Neural networks and deep learning” by Michael Nielsen (2014)
"Deep Learning" from Microsoft Research (2013)
“Deep Learning Tutorials” from LISA Laboratory, University of Montreal (2015)
“neuraltalk” by Andrej Karpathy
"Introduction to Genetic Algorithms"
"Modern approach to artificial intelligence"
"Overview of deep learning and neural networks"

Videos and lectures

Deep Learning Simplified is a wonderful YouTube channel. Here's their first video:

Deep learning is changing the paradigm of working with texts, but it is causing skepticism among computational linguists and data scientists. Neural networks are a powerful but trivial machine learning tool.

03.05.2017 Dmitry Ilvovsky, Ekaterina Chernyak

Neural networks make it possible to find hidden connections and patterns in texts, but these connections cannot be presented explicitly. Neural networks, although powerful, are quite a trivial tool, causing skepticism among companies developing industrial solutions in the field of data analysis, and among leading computational linguists.

The general fascination with neural network technologies and deep learning has not bypassed computer linguistics - automatic processing of texts in natural language. At recent conferences of the Association for Computational Linguistics ACL, the main scientific forum in this field, the vast majority of reports were devoted to the use of neural networks both for solving already known problems and for exploring new ones that have not been solved using standard means machine learning. The increased attention of linguists to neural networks is due to several reasons. The use of neural networks, firstly, significantly improves the quality of solutions for some standard tasks classification of texts and sequences, secondly, reduces the labor intensity when working directly with texts, thirdly, allows you to solve new problems (for example, create chat bots). At the same time, neural networks cannot be considered a completely independent mechanism for solving linguistic problems.

First work on deep learning(deep learning) date back to the middle of the 20th century. In the early 1940s, Warren McCulloch and Walter Pitts proposed a formal model of the human brain - an artificial neural network, and a little later Frank Rosenblatt generalized their work and created a neural network model on a computer. The first work on training neural networks using the backpropagation algorithm dates back to the 1960s (the algorithm calculates the prediction error and minimizes it using stochastic optimization methods). However, it turned out that, despite the beauty and elegance of the idea of simulating the brain, training “traditional” neural networks takes a lot of time, and the classification results on small data sets are comparable to the results obtained more simple methods, for example, support vector machines (SVM). As a result, neural networks were forgotten for 40 years, but today they have again become in demand when working with large volumes unstructured data, images and texts.

From a formal point of view, a neural network is a directed graph of a given architecture, the vertices or nodes of which are called neurons. The first level of the graph contains input nodes, the last level contains output nodes, the number of which depends on the task. For example, to classify into two classes, one or two neurons can be placed at the output layer of the network; for classification into k classes, k neurons can be placed. All other levels in the neural network graph are usually called hidden layers. All neurons located at one level are connected by edges to all neurons of the next level, each edge has a weight. Each neuron is assigned an activation function that models the operation of biological neurons: they are “silent” when the input signal is weak, and when its value exceeds a certain threshold, they fire and transmit the input value further along the network. The task of training a neural network using examples (that is, using “object - correct answer” pairs) is to find the weights of the edges, the best way predicting correct answers. It is clear that it is the architecture - the topology of the structure of the neural network graph - that is its the most important parameter. Although there is no formal definition for “deep networks” yet, it is generally accepted that all neural networks that consist of a large number of layers or have “non-standard” layers (for example, containing only selected connections or using recursion with other layers) are considered deep.

An example of the most successful use of neural networks so far is image analysis, but neural network technologies have also radically changed the work with text data. If previously each element of the text (letter, word or sentence) had to be described using many features of a different nature (morphological, syntactic, semantic, etc.), now in many tasks the need for complex descriptions disappears. Theorists and practitioners of neural network technologies often talk about “representation learning” - in raw text, broken down only into words and sentences, a neural network is able to find dependencies and patterns and independently compose a feature space. Unfortunately, in such a space a person will not understand anything - during training, the neural network assigns each element of the text one dense vector consisting of certain numbers representing the discovered “deep” relationships. The emphasis when working with text shifts from constructing a subset of features and searching for external knowledge bases to selecting data sources and marking up texts for subsequent training of a neural network, which requires significantly more data compared to standard methods. It is precisely because of the need to use large amounts of data and because of poor interpretability and unpredictability that neural networks are not in demand in real industrial-scale applications, unlike other well-established learning algorithms such as random forests and support vector machines. Nevertheless, neural networks are used in a number of automatic text processing tasks (Fig. 1).

One of the most popular applications of neural networks is the construction of vectors of words related to the field of distributional semantics: it is believed that the meaning of a word can be understood by the meaning of its context, by the surrounding words. Indeed, if we are unfamiliar with a word in a text in a known language, then in most cases we can guess its meaning. Mathematical model the meanings of a word are vectors of words: rows in a large “word-context” matrix, built from a fairly large corpus of texts. As "contexts" for specific word there may be neighboring words, words included in the same syntactic or semantic construction with the given one, etc. Frequencies can be written in the cells of such a matrix (how many times a word occurs in a given context), but more often they use the coefficient of positive pairwise mutual information (Positive Pointwise Mutual Information, PPMI), showing how non-random the appearance of a word was in a particular context. Such matrices can be quite successfully used for clustering words or for searching for words that are close in meaning to the search word.

In 2013, Tomas Mikolov published a paper in which he proposed using neural networks to train word vectors, but for a smaller dimension: using tuples (word, contexts), a neural network of the simplest architecture was trained, and at the output, each word was assigned a vector of 300 elements. It turned out that such vectors better convey the semantic proximity of words. For example, on them you can define the arithmetic operations of addition and subtraction of meanings and obtain the following equations: “Paris - France + Russia = Moscow”; “king – man + woman = queen.” Or find superfluous word in the row “apple, pear, cherry, kitten.” The work presented two architectures, skip-gram and CBOW (Continuous Bag of Words), under the general name word2vec. As later shown in , word2vec is nothing more than a factorization of a word-context matrix with PPMI weights. It is now customary to classify word2vec as distributive semantics rather than deep learning, but the initial impetus for the creation of this model was the use of a neural network. In addition, it turned out that word2vec vectors serve as a convenient representation of the meaning of a word, which can be fed as input to deep neural networks used for text classification.

The task of text classification is one of the most pressing for marketers, especially when we're talking about about the analysis of consumer opinions or attitudes towards a product or service, so researchers are constantly working to improve the quality of its solution. However, opinion analysis is a task of classifying sentences rather than texts - in positive feedback the user can write one or two negative sentences, and it is also important to be able to identify and analyze them. A well-known difficulty in classifying sentences lies in the variable length of the input - since sentences in texts can be of arbitrary length, it is not clear how to submit them to the input of a neural network. One approach is borrowed from the field of image analysis and involves the use of convolutional neural networks (CNN) (Fig. 2).

The input of the convolutional neural network is a sentence in which each word is already represented by a vector (vector of vectors). Typically, pre-trained word2vec models are used to represent words as vectors. A convolutional neural network consists of two layers: a “deep” convolution layer and a regular hidden layer. The convolution layer, in turn, consists of filters and a “subsampling” layer. A filter is a neuron whose input is formed using windows that move through the text and select a certain number of words sequentially (for example, a window of length “three” will select the first three words, words from the second to the fourth, from the third to the fifth, etc.) . At the output of the filter, one vector is formed that aggregates all vectors of words included in it. Then, a single vector corresponding to the entire proposal is generated at the subsampling layer, which is calculated as the component-wise maximum of all the output filter vectors. Convolutional neural networks are easy to train and implement. To train them, a standard backpropagation algorithm is used, and due to the fact that the weights of the filters are uniformly distributed (the weight of the i-th word from the window is the same for any filter), the number of parameters for a convolutional neural network is small. From the point of view of computer linguistics, convolutional neural networks are powerful tool for classification, which, however, does not have any linguistic intuition behind it, which significantly complicates the analysis of algorithm errors.

Sequence classification is a task in which each word needs to be assigned one label: morphological analysis (each word is assigned a part of speech), named entity extraction (determining whether each word is part of a person’s name, geographical name, etc.) etc. When classifying sequences, methods are used that take into account the context of the word: if the previous word is part of a person’s name, then the current one may also be part of the name, but is unlikely to be part of the name of the organization. Recurrent neural networks, which expand the idea of language models proposed at the end of the last century, help to implement this requirement in practice. The classical language model predicts the probability that the word i will occur after the word i-1. Language models can also be used to predict the next word: what word is most likely to appear after this one?

To train language models, large corpora are needed - the larger the training corpus, the more pairs of words the model “knows”. Using neural networks to develop language models reduces the amount of data stored. Let's imagine a simple network architecture in which the words i-2 and i-1 are input, and at the output the neural network predicts the word i. Depending on the number of hidden layers and the number of neurons on them, the trained network can be stored as a number of dense matrices of relatively small dimension. In other words, instead of a training corpus and all pairs of words, it can store only a few matrices and a list of unique words. However, such a neural language model does not allow long connections between words to be taken into account. This problem is solved by recurrent neural networks (Fig. 3), in which the internal state of the hidden layer is not only updated after a new word arrives at the input, but is also passed on to the next step. Thus, hidden layer The recurrent network accepts two types of inputs: the state of the hidden layer at the previous step and a new word. If a recurrent neural network processes a sentence, then hidden states allow long connections in sentences to be remembered and transmitted. It has been repeatedly verified experimentally that recurrent neural networks remember the gender of the subject in a sentence and select the correct pronouns (she - her, he - his) when generating a sentence, but show explicitly how exactly this kind of information is stored in a neural network or how it is used , still hasn't succeeded.

Recurrent neural networks are also used for text classification. In this case, the outputs from the intermediate steps are not used, and the last output of the neural network returns the predicted class. Today, bidirectional (transmitting the hidden state not only “to the right” but also “to the left”) recurrent networks, having several dozen neurons in the hidden layer, have become a standard tool for solving problems of text and sequence classification, as well as text generation, and have essentially replaced other algorithms .

The development of recurrent neural networks has become architectures of the Seq2seq type, consisting of two connected recurrent networks, one of which is responsible for representing and analyzing the input (for example, a question or sentence in one language), and the second for generating the output (an answer or sentence in another language) . Seq2seq networks underlie modern systems“question-answer”, chat bots and machine translation systems.

In addition to convolutional neural networks, so-called autoencoders are used for text analysis, which are used, for example, to create effects on images in Photoshop or Instagram and have found application in linguistics in the problem of dimensionality reduction (searching for the projection of a vector representing text onto a space of a lower dimension). Projection onto a two-dimensional space makes it possible to represent text as a point on a plane and allows you to visually depict a collection of texts as a set of points, that is, it serves as a means of preliminary analysis before clustering or classifying texts. Unlike the classification task, the dimensionality reduction task does not have clear quality criteria, but the images obtained when using autoencoders look quite “convincing.” From a mathematical point of view, an autoencoder is an unsupervised neural network that learns linear function f(x) = x and consists of two parts: an encoder and a decoder. An encoder is a network with multiple hidden layers with a decreasing number of neurons. The decoder is a similar network with an increasing number of neurons. They are connected by a hidden layer, which has as many neurons as there should be dimensions in a new space of lower dimensionality, and it is this layer that is responsible for reducing the dimensionality. Like convolutional neural networks, an autoencoder does not have any linguistic interpretation, so it can be considered an engineering tool rather than an analytical one.

Despite the impressive results, a neural network cannot be considered an independent tool for text analysis (searching for patterns in language), much less for understanding text. Yes, neural networks make it possible to find hidden connections between words and discover patterns in texts, but until these connections are presented in an interpretable form, neural networks will remain fairly trivial machine learning tools. In addition, deep learning is not yet in demand in industrial analytical solutions, since it requires unreasonable costs for data preparation and unpredictability of results. Even the research community is critical of attempts to make neural networks universal tool. In 2015, Chris Manning, head of the computational linguistics group at Stanford and president of ACL, clearly outlined the applicability of neural networks. In it he included the tasks of text classification, sequence classification and dimensionality reduction. However, thanks to the marketing and popularization of deep learning, attention to computational linguistics itself and its new applications has increased.

Literature

Tomas Mikolov et. al. Efficient Estimation of Word Representations in Vector Space, arxiv.org. URL: http://arxiv.org/pdf/1301.3781.pdf
Levy Omer, Yoav Goldberg, Ido Dagan. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3. - 2015. - P. 211–225. URL: https://www.transacl.org/ojs/index.php/tacl/article/view/570/124 (access date: 05/18/2017).
Pavel Velikhov. Machine learning for understanding natural language// Open Systems.DBMS. - 2016. - No. 1. - P.18–21. URL: (access date: 05/18/2017).
Christopher Manning. Computational linguistics and deep learning. Computational Linguistics. - 2016. URL: http://www.mitpressjournals.org/doi/full/10.1162/COLI_a_00239#.WQH8MBhh2qA (access date: 05/18/2017).

Dmitry Ilvovsky ([email protected]) - employee of the International Laboratory intelligent systems and structural analysis, Ekaterina Chernyak ([email protected]) - teacher at the Center for Continuing Education, Faculty of Computer Science, National Research University Higher School of Economics (Moscow). The work was carried out within the framework of the Basic Research Program of the National Research University Higher School of Economics.

And in parts, this guide is intended for anyone who is interested in machine learning but doesn't know where to start. The content of the articles is intended for a wide audience and will be quite superficial. But does anyone really care? How more people become interested in machine learning, so much the better.

Object recognition using deep learning

You may have already seen this famous xkcd comic. The joke is that any 3-year-old can recognize a photo of a bird, but getting a computer to do it took the best computer specialists for over 50 years. In the last few years, we have finally found a good approach to object recognition using deep convolutional neural networks. This sounds like a bunch of made-up words from a William Gibson science fiction novel, but it will make sense once we take them one by one. So let's do it - write a program that recognizes birds!

Let's start simple

Before learning how to recognize bird images, let's learn how to recognize something much simpler - handwritten number"8".

There is a lot of talk and writing about artificial neural networks today, both in the context of big data and machine learning and outside it. In this article, we will recall the meaning of this concept, once again outline the scope of its application, and also talk about an important approach that is associated with neural networks - deep learning, we will describe its concept, as well as the advantages and disadvantages in specific use cases.

What is a neural network?

As you know, the concept of a neural network (NN) comes from biology and is a somewhat simplified model of the structure of the human brain. But let’s not delve into the wilds of natural science - the easiest way is to imagine a neuron (including an artificial one) as a kind of black box with many input holes and one output.

Mathematically, an artificial neuron transforms a vector of input signals (impacts) X into a vector of output signals Y using a function called an activation function. Within the connection (artificial neural network - ANN), three types of neurons function: input (receiving information from outside world- values of the variables we are interested in), output (returning the desired variables - for example, predictions, or control signals), as well as intermediate - neurons that perform certain internal (“hidden”) functions. A classical ANN thus consists of three or more layers of neurons, and in the second and subsequent layers (“hidden” and output), each element is connected to all elements of the previous layer.

It is important to remember the concept feedback, which determines the type of ANN structure: direct signal transmission (signals go sequentially from the input layer through the hidden layer and enter the output layer) and recurrent structure, when the network contains connections going back, from more distant to nearer neurons). All these concepts make up minimum required information to move to the next level of understanding ANN - training a neural network, classifying its methods and understanding the principles of operation of each of them.

Neural network training

We should not forget why such categories are used in general - otherwise there is a risk of getting bogged down in abstract mathematics. In fact, artificial neural networks are understood as a class of methods for solving certain practical problems, among which the main ones are the problems of pattern recognition, decision making, approximation and data compression, as well as the most interesting for us problems of cluster analysis and forecasting.

Without going to the other extreme and without going into details of the operation of ANN methods in each specific case, let us remind ourselves that under any circumstances it is the ability of a neural network to learn (with a teacher or “on its own”) that is key point using it to solve practical problems.

In general, training an ANN is as follows:

input neurons receive variables (“stimuli”) from the external environment;
in accordance with the information received, the free parameters of the neural network change (intermediate layers of neurons work);
As a result of changes in the structure of the neural network, the network “reacts” to information in a different way.

This is the general algorithm for training a neural network (let’s remember Pavlov’s dog - yes, yes, the internal mechanism for the formation of a conditioned reflex is exactly that - and let’s immediately forget: after all, our context involves operating technical concepts and examples).

It is clear that a universal learning algorithm does not exist and, most likely, cannot exist; Conceptually, approaches to learning are divided into supervised learning and unsupervised learning. The first algorithm assumes that for each input (“learning”) vector there is a required value of the output (“target”) vector - thus, these two values form a training pair, and the entire set of such pairs is the training set. In the case of unsupervised learning, the training set consists only of input vectors - and this situation is more plausible from the point of view of real life.

Deep learning

The concept of deep learning refers to a different classification and denotes an approach to training so-called deep structures, which include multi-level neural networks. A simple example from the field of image recognition: it is necessary to teach a machine to identify increasingly abstract features in terms of other abstract features, that is, to determine the relationship between the expression of the entire face, eyes and mouth, and, ultimately, clusters of colored pixels mathematically. Thus, in a deep neural network, each level of features has its own layer; It is clear that training such a “colossus” requires the appropriate experience of researchers and the level of hardware. Conditions developed in favor of deep neural learning only by 2006 - and eight years later we can talk about the revolution that this approach has produced in machine learning.

So, first of all, in the context of our article, it is worth noting the following: deep learning in most cases is not supervised by a person. That is, this approach involves training a neural network without a teacher. This is the main advantage of the “deep” approach: supervised machine learning, especially in the case of deep structures, requires enormous time – and labor – costs. Deep learning is an approach that models human abstract thinking (or, according to at least, represents an attempt to approach it), rather than using it.

The idea, as usual, is wonderful, but quite natural problems arise in the way of the approach - first of all, rooted in its claims to universality. In fact, while deep learning approaches have achieved significant success in the field of image recognition, the same natural language processing still raises many more questions than answers. It is obvious that in the next n years it is unlikely that it will be possible to create an “artificial Leonardo Da Vinci” or even - at least! - “artificial homo sapiens”.

However, artificial intelligence researchers are already faced with the question of ethics: the fears expressed in every self-respecting science fiction film, from “Terminator” to “Transformers”, no longer seem funny (modern sophisticated neural networks can already be considered a plausible model the work of the insect's brain!), but are clearly unnecessary for now.

The ideal technological future appears to us as an era when a person will be able to delegate most of his powers to a machine - or at least be able to allow it to facilitate a significant part of his work. intellectual work. The concept of deep learning is one step towards this dream. The road ahead is long, but it is already clear that neural networks and the ever-evolving approaches associated with them are capable of realizing the aspirations of science fiction writers over time.