What's new in Yandex's search algorithms. New Yandex algorithm "Korolev"


On August 22, 2017, Yandex launched a new version of the search algorithm - “Korolev”. You can describe its essence as briefly and succinctly as possible with words from the Yandex press release:

The launch of the algorithm took place at the Moscow Planetarium and was accompanied by reports from the algorithm developers, a ceremonial pressing of the launch button, and even a call to the ISS and a live broadcast with the cosmonauts.

The full video of the presentation can be viewed right here, and below we will look at the main changes and responses to FAQ. We will accompany the information with comments from Yandex employees on the company blog, as well as quotes from official sources.

What has changed in Yandex search?

“Korolev” is a continuation of the “Palekh” algorithm, introduced in November 2016. "Palekh" was the first step towards semantic search, the task of which is to better understand the meaning of pages.

“Korolyov” is now able to understand the meaning of the entire page, and not just the title, as was the case after the announcement of “Palekh”.


The algorithm should improve results for rare and complex queries.

Documents may not contain many query words, so traditional detection algorithms text relevance will not cope with this task.

It looks something like this:

Google uses a similar algorithm – RankBrain:

The scope of the Queen algorithm applies to all requests, including commercial ones. However, the impact is most noticeable on verbose queries. Yandex has confirmed that the algorithm works for all searches.

Of course, the goal of the algorithm was to improve the quality of search results for rare and complex issues. Let's check on rare and complex commercial requests related specifically to the name of the item. For example, in this case, Yandex really understands what we're talking about. True, the search results are mostly reviews and articles, not commercial sites.


And in this case, the search engine realized that I was most likely interested in a drone or quadcopter. Of course, search results start from Yandex.Market:


But in some cases Yandex is powerless...


How it works (+ 11 photos from the presentation)

Let's take a closer look at the presentation of the new algorithm. Below there will be only excerpts of the most interesting moments with our comments and slides from the presentation.

The new version of search is based on a neural network. It consists of a large number of neurons. A neuron has one output and several inputs; it can summarize the information received and, after transformation, transmit it further.


A neural network can perform much more complex tasks and can be trained to understand the meaning of text. To do this, you need to give her many training examples.

Yandex began work in this direction with the DSSM model, consisting of two parts corresponding to the request and the page. The output was an assessment of how close they were in meaning.


To train a neural network, you need many training examples.


    Negative ones are a pair of texts that are not related in meaning.

    Positive – text-query pairs that are related in meaning.

According to the presentation, Yandex used for training an array of data on user behavior in search results and considered the query and the page that users often click on in search results to be related in meaning. But as Mikhail Slivinsky later explained, user satisfaction with search results is measured not only by clicks:


As Alexander Sadovsky previously said in his Palekh presentation, the presence of a click does not mean that the document is relevant, but the absence does not mean that it is not relevant. The Yandex model predicts whether a user will stay on the site and takes into account many other user satisfaction metrics.

After training, the model represents the text as a set of 300 numbers - a semantic vector. The closer the texts are in meaning, the greater the similarity of the vector numbers.


Neural models have been used in Yandex search for a long time, but in the Korolev algorithm the influence of neural networks on ranking has been increased.

Now, when assessing semantic proximity, the algorithm looks not only at the title, but also at the text of the page.

In parallel, Yandex was working on an algorithm for comparing the meanings of queries based on neural networks. For example, if for one query the search engine knows exactly the best answer, and the user entered a query very close to it, then the search results should be similar. To illustrate this approach, Yandex gives an example: “lazy cat from Mongolia” - “manul”. ()


In Palekh, neural models were used only at the very latest stages of ranking, on approximately the top 150 documents. Therefore, in the early stages of ranking, some documents were lost, but they could have been good. This is especially important for complex and low-frequency queries.

Now, instead of calculating the semantic vector during query execution, Yandex makes calculations in advance - during indexing. “Korolev” carries out calculations on 200 thousand documents per request, instead of 150, which were previously under “Palekh”. First, this method of preliminary calculation was tested at Palekh; this made it possible to save on power and match the request not only with the title, but also with the text.


The search engine takes the full text at the indexing stage, performs the necessary operations and obtains the value. As a result, for all words and popular pairs of words, an additional index is formed with a list of pages and their preliminary relevance to the query.

The Yandex team, which was involved in the design and implementation of the new search, is launching it.



Running the algorithm:


Artificial Intelligence Training

Yandex has been tasked with collecting data for many years. machine learning are dealt with by assessors who evaluate the relevance of documents to the request. From 2009 to 2013, the search engine received more than 30 million such ratings.


During this time, image and video search, internal classifiers and algorithms appeared: the number of Yandex projects increased.


Since they all worked on machine learning technologies, more assessments and more assessors were required. When there were more than 1,500 assessors, Yandex launched the Toloka crowdsourcing platform, where anyone can register and complete tasks.

For example, these are the tasks found in Toloka:


Or these:


If you want to learn more about how users evaluate the relevance of answers in order to understand what parameters of the search results are evaluated, we recommend reading the instructions for the tasks or even trying to take the training.

Over the course of several years, the service attracted more than 1 million people who made more than 2 billion ratings. This allowed Yandex to make a huge leap in the scale and volume of training data. In 2017 alone, more than 500,000 people completed tasks.


Among the tasks are:

  • Assessing the relevance of documents;


  • Tasks for developing cards. This is how they check the relevance of data about organizations for the Directory database;
  • Setup tasks speech technologies voice search.

The rules that Yandex wants to teach the algorithm are open to all registered users in the form of instructions for Toloka employees. For some tasks, people's subjective opinions are simply collected.

Here is an excerpt from the instructions on how Yandex determines the relevance of a document:


The quality of ratings is very important to Yandex. It can be subjective, so tasks are given to several people at once, and then a mathematical model evaluates the distribution of votes, taking into account the degree of trust in each employee and the expertise of each participant. For each “toloker”, data on the accuracy of assessments for each project is stored and compiled into a single rating.

That is why you cannot complain that the bias of assessors ruined your site.

Thus, an additional group of factors has appeared in Yandex:

  • The meaning of the page and its relevance to the request;
  • Whether the document is a good answer to similar user queries.

What has changed in the Yandex top?

The algorithm was supposedly launched somewhat earlier than the presentation and, according to third party services(for example, https://tools.pixelplus.ru/updates/yandex), changes in the search results began in early August, but it is unknown whether this is related to the “Korolyov” algorithm.




Based on these data, we can hypothesize that the decrease in the share of main pages in the top 100 and the decrease in the age of documents within the top 100 is associated with a new algorithm that helps to obtain more relevant answers.

True, there are no noticeable changes in the top 10, top 20 or top 50. Perhaps they are not there or they are insignificant. We also did not notice any significant changes in search results for promoted queries.

Textual relevance in the standard sense has not gone away. Collections and broader responses to multi-word queries contain a large number of pages with occurrences of query words in the title and text:


The freshness of the search results also matters. An example from a Yandex presentation contains a number of recent results with the entire search phrase.



Although, given the fact that the algorithm carries out calculations immediately during indexing, Korolev can theoretically influence the mixing of results by a quick bot.

Is it necessary to somehow optimize the texts for “Queens”?

Quite the contrary: the more a search engine learns to determine the meaning of the text, the fewer occurrences of keywords are required and the more meaning is required. But the principles of optimization do not change.


For example, back in 2015, Google talked about the RankBrain algorithm, which helps search better respond to multi-word queries asked in natural language. It works well, as users noted in numerous publications comparing Yandex and Google search after the announcement new version algorithm.


This was not accompanied by a large-scale presentation and did not greatly influence the work of the specialists. No one is purposefully engaged in “optimizing for RankBrain”, so in Yandex this does not globally change the work of a specialist. Yes, there is a trend to search for and include so-called LSI keys in the text, but these are clearly not just frequently repeated words on competitors’ pages. We expect the development of SEO services in this direction.

The algorithm also states that it analyzes the meaning of other queries that bring users to the page. Again, in the future, this should give the same or similar results for synonymous queries, since now the result of the analysis of the results sometimes shows that there are no intersections for synonymous queries in the results. Let's hope that the algorithm will help eliminate such inconsistencies.

But Yandex cannot yet find (or is difficult to find) documents that are close in meaning to the query, but do not contain the query words ().


Adviсe:

    Make sure the page responds to the queries it is optimized for and that users click on.

    Make sure the page still includes words from search queries. We are not talking about direct occurrences, just check if the words from the queries are in any form on the page.

    Topical words can add extra relevance to a page, but they're clearly not just words that are repeated frequently on competitors' pages. We expect the development of SEO services in this direction.

    For key phrases for which a site page is searched well, check to see if the bounce rate is below the average for the site. If the site ranks high for a query and the user finds what he needs, the site can be shown for keywords that are similar in meaning (if any).

    Search clicks indicate user satisfaction with the result. This is not new, but it’s worth checking the snippets again for key queries. Perhaps somewhere it will be possible to increase the click-through rate.

How to check the impact of an algorithm on your website?

For sites that do not have a pronounced seasonality, you can compare the number of low-frequency key phrases that led to the site before and after the algorithm was launched. For example, take a week in July and a week in August.


Select “Reports – Standard reports – Sources – Search queries”.

Selecting visits from Yandex:

And we filter only those requests for which there was 1 click. Additionally, it is worth excluding phrases containing the brand name.



You can also look at the presence of search phrases, words from which you do not have in the text. In general, such phrases were present among low-frequency queries before, but now they may become noticeably more numerous.

Prospects and forecast

    The search engine will be able to find documents that are close in meaning to the query even better. The presence of occurrences will become even less important.

    Personalization will be added to the current algorithm.

    In perspective good materials, answering a user's question may receive even more traffic for microfrequency, rare, or semantically similar queries.

    For low-frequency keywords, competition may increase due to the greater relevance of non-optimized documents.

    Hypothesis. With the help of such algorithms, Yandex can better assess how semantically related pages linking to others are, and take this into account when assessing external links. If this can be a significant factor given the weak influence of links in Yandex.

    We should expect further changes related to neural networks in other Yandex services.

Question answer

Question: since Yandex evaluates clicks, does this mean that cheating on behavioral factors will gain momentum?


Question: Is Korolev connected with Baden-Baden?


Question: How to turn it on new search Yandex?

Answer: on the Yandex blog and in search queries, there were often questions about how to enable or install a new search. No way. The new algorithm is already working and there are no additional settings no need to do it.

“Korolev” is not Minusinsk, not Baden-Baden. This is not a punitive filter. By the way, it is not an add-on - it is part of the main Yandex algorithm.

“Korolev” works on the basis of a self-learning neural network and affects rather rare multi-word queries, primarily informational ones, which are aimed at clarifying the meaning - low-frequency (LF) and micro-LF, including voice search, different natural variants of queries, such as “a film where a man wears different shoes.”

This algorithm was created in order to improve the quality of results for such queries, similar to Google’s RankBrain, which has been doing this task well for a long time, and even now, according to the first measurements, it works better than Yandex for such queries.

Before this, there was and is the “Palekh” algorithm, which already began to search by meaning, but did it in real time and compared only the meaning of the request and the title - Page Title.

“Korolyov” analyzes not only the Title, but the entire page as a whole, showing in the search results even those pages where there is no mention of words from the request, but the meaning of the page is suitable. At the same time, it determines the essence of the page in advance, at the indexing stage - as a result, the speed and number of processed pages have increased dramatically.

About “a third” here may be an exaggeration - no one has yet measured the real share of requests that will affect “Korolyov”.

Other articles about “Queens”:

There are many points I haven’t covered here yet; it’s worth reading about them in other articles. I have chosen here only the best, truly worthwhile ones:

Opinions of various experts:

Additional official sources:

Some excerpts from the opinions linked above:

Dmitry Shakhov

“Korolev” will pass by search engine optimization. At least on at this stage. The search task is to provide answers to queries for which there are no documents with occurrences. Search solves this problem. Hummingbird in Google, Palekh and Korolev in Yandex. Queries for which there are no documents are not included in the area of ​​interest for search engine optimization. That is why there are no documents there.

Arthur Latypov

Many expected that soon after “Palekh” an algorithm would appear that would work similarly, but not based on headlines, but based on content. While we haven’t noticed any surges in traffic on the monitored sites, we will watch more closely and look at the development of the algorithm in the future. Interestingly, in the past, to improve ranking for a large number of queries, including related ones, SEO texts were prepared, some were better, some were worse, some called them differently, but the meaning did not change . Now SEO texts are punished, and the search will rank documents according to their meaning.
We expect that optimizers will use LSI more when preparing text optimization.

Accordingly, SEO services will develop. Let me remind you that preparing a list of SEO words, related terms, and related queries for content preparation and optimization has been used by specialists for several years. Therefore, there will be no major changes in mechanics, at least for now.

As a result, we pay more attention to:

Content quality;
request intent;
issue monitoring.

And, of course, it’s always interesting after the launch of a new algorithm from Yandex to analyze what has changed and see what happens next.

Dmitry Sevalnev

In fact, with the introduction of a number of new factors that take into account:

semantic correspondence of the request-document pair (throughout the entire text of the document, and not just through the Title, as previously in the Palekh algorithm),
the quality of the document’s response to search queries that are similar in meaning, –

there will be no global changes for the SEO industry. The most significant changes will affect the “long tail” of queries, for which SEO specialists do little targeted work. There are many of them, they are rare and often provide single visits to the site.

The importance of a number of factors that are already being studied by specialists can be increased since the moment LSI became a “fashionable topic”.

Oleg Shestakov, CTO & Founder at Rush Analytics Russia

The announcement of the Queens algorithm was probably the biggest of all time in terms of the show. From a technology point of view, it cannot be said that this is some kind of new technological breakthrough. What is the point of innovation: now Yandex neural networks evaluate the query-document correspondence not just from the point of view of the occurrence of the query and its variations (lemma, parts of the query, etc.) in the text of the document, but also from the point of view of the meaning of the request. A neural network trained on big data can now determine the relevance of a document to a query, even if there are no occurrences of the query words in the document. In fact, it's not new technology– it was also used in the Palekh algorithm, although it only took into account document headers. Those. The real innovation here is that Yandex engineers were able to scale a very resource-intensive algorithm by several orders of magnitude - now the algorithm can evaluate hundreds of thousands of documents, and not 150 lines of text as before.

How will the algorithm affect the SEO market?

— Globally, no way. This is just part of the algorithm, and most other factors have worked and will continue to work. This algorithm should most strongly affect low-frequency requests and some mid-range requests.

— We will have to pay more attention to the quality of texts. Now, in order to bring a page to the TOP, the page text must contain as many synonymous words and words related to the query as possible in order to go through the factors of the new algorithm, because it now takes into account exactly such words, and not just “direct occurrences”. There is no magic here - the neural network is trained by assessor teachers and still works with the texts of real sites, finding words that are related in meaning. This means that you can conduct a similar analysis and extract these words from the TOP documents. Competent SEO specialists started doing this several years ago. In simple words - the same LSI, only in profile.

— The market for cheap copywriting will begin to collapse, and this is very good. A task to write text in the format “3 direct occurrences, 4 diluted and length 2500 characters” will generate texts that will be poorly ranked.

Now we need story texts. We, as SEO specialists, must tell the story about the client’s product in every detail, describing the product from all sides - with this approach it will be physically difficult to miss important thematic words of the query. Please note that webmasters who make money on article sites have been writing story texts for a very long time, even about alimony lawyers, with excellent layout, disclosure of the topic and points of interest. What is the result? They have a ton of traffic and TOP rankings, plus a total victory over dry law firm websites.

Content production will become somewhat more expensive and more professional. SEO companies will either stop writing SEO bullshit and build adult content teams internally, or their clients will lose rankings in search. Yandex hinted at this yesterday.

Alexander Alaev

“Korolev” is not about SEO at all. The goal of SEO is to work with queries that are asked many times and their meaning is clear, and there are thousands of relevant answers. The task of a search engine in the commercial segment is to find the best candidates based on commercial criteria, and not to look for meaning. This is why commercial search results will not change, at least not noticeably.

But the owners of information resources should once again pay attention to the quality of the content, target their publications not to search queries, but to the interests of users, and write in human, simple language.

All my projects except this SEO blog:

TOP Base- a high-quality base for semi-automatic registration with Allsubmitter or for completely manual placement - for independent free promotion of any site, attracting target visitors to the site, increasing sales, naturally diluting the link profile. I have been collecting and updating the database for 10 years. There are all types of sites, all topics and regions.

SEO-Topshop- SEO software with DISCOUNTS, according to favorable conditions, news of SEO services, databases, manuals. Including Xrumer on the most favorable terms and with free training, Zennoposter, Zebroid and various others.

My free comprehensive SEO courses- 20 detailed lessons in PDF format.
- catalogs of sites, articles, press release sites, bulletin boards, company directories, forums, social networks, blog systems, etc.

"Approaching.."- my blog on the topic of self-development, psychology, relationships, personal effectiveness

The website ranking algorithm in Yandex is constantly subject to changes and additions: new functionality is added, restrictions, filters are updated... For a very long time, all ranking algorithms were kept track of only within the company, and when it was suddenly updated, users were indignant and, frankly speaking, understood little.

It took a lot of time to research Yandex ranking algorithms, search for answers to the topic of filters and how not to get into the “black list”. Now everything is a little simpler, but not so much that we ignore the analysis of how Yandex works.

Yandex algorithms already have quite long history creation and formation, since the distant 1997. Since that time, Yandex has changed and new algorithms and new filters have appeared. Let's start our “debriefing”, perhaps, with the “freshest” algorithms.

New Yandex algorithm “Baden-Baden”. 2017

Yandex has a new algorithm for detecting text spam called “Baden-Baden”.
The algorithm was created to combat the “cheat” of relevance by writing texts that are useless for the user and “over-optimized” (with a large number of occurrences of keys).

As stated in the Yandex blog, the algorithm that detects text spam has been significantly changed and improved. The authors of the publication themselves claim that this algorithm “is part of the general ranking algorithm; the result of its work may be a deterioration in the positions of over-optimized pages in search results.” And what could this mean?

Firstly, if it has been “reworked and improved,” then most likely this algorithm was created to replace the already familiar “overspam” and “overoptimization” filters. And if it really “is part of the overall ranking algorithm,” then it will, of course, be more difficult to diagnose the presence of “fines” imposed by this algorithm.

New Yandex algorithm 2016. "Palekh"

The algorithm will try to compare the meanings of the request using neural networks, and not just compare keywords, as was usually done. This was done in order to ensure the best results for the rarest user requests. The new algorithm is based on neural networks and helps Yandex find a match between a search query and page titles, even if they do not have common key phrases. To understand what actually happened, a few quotes from the official blog of the Yandex company:

In our case, we are dealing not with pictures, but with texts - these are the texts of search queries and web page titles - but training follows the same scheme: using positive and negative examples. Each example is a request-header pair. You can select examples using the statistics accumulated by the search. By learning from user behavior, the neural network begins to “understand” the semantic correspondence between the request and page titles.

The semantic vector is used not only in Yandex search, but also in other services - for example, in Pictures. There it helps to find images on the Internet that most closely match a text query.

Semantic vector technology has enormous potential. For example, not only headers, but also full texts of documents can be translated into such vectors - this will make it possible to compare queries and web pages even more accurately.
The introduction of the new Yandex algorithm is another significant argument in favor of promotion for low-frequency queries for those involved in the development and promotion of websites. The prospects for the development of the new Yandex algorithm only confirm the correctness of the chosen direction, because in the near future we will talk about improved recognition of not only headlines, but everything text document generally (!).

In Yandex, the frequency distribution graph is presented in the form of a bird that has a beak, a body and a long tail, characteristic of a firebird

  • Beak - the highest frequency queries. The list of such requests is not very large, but they are asked very, very often.
  • Torso - mid-frequency queries.
  • Tail - low-frequency and micro-low-frequency queries. “Individually they are rare, but together they form a significant part of the search flow, and therefore add up to a long tail.”

This tail belongs to a bird that appears quite often in Palekh miniatures. That is why the algorithm was called “Palekh”.

All Yandex algorithms. (2007-2017)

  • July 2, 2007. "Version 7". A new ranking formula, an increase in the number of factors, the announcement took place only on searchengines.guru.
  • December 20, 2007. January 17, 2008. "Version 8" and "Eight SP1". Reputable resources received a significant advantage in ranking, the introduction of filtering “runs” to boost reference factors.
  • May 16, July 2, 2008. “Magadan” (Fast Rank for quick selection of applicants, softness, expansion of the base of abbreviations and synonyms, expanded document classifiers), “Magadan 2.0” (uniqueness of content, new classifiers of user requests and documents).
  • September 11, 2008. “Nakhodka” (taking into account stop words in a search query, a new approach to machine learning, thesaurus).
  • April 10, June 24, August 20, August 31, September 23, September 28, 2009.
    “Arzamas / Anadyr” (taking into account the user’s region, removing homonymy), “Arzamas 1.1” (new regional formula for a number of cities, except Moscow, St. Petersburg and Yekaterinburg), “Arzamas 1.2” (new classifier of geo-dependence of queries), “Arzamas+ 16" (independent formulas for 16 regions of Russia), "Arzamas 1.5" (new general formula for geo-independent queries), "Arzamas 1.5 SP1" (improved regional formula for geo-independent queries).
  • November 17, 2009. “Snezhinsk” (launch of MatrixNet machine learning technology, multiple increase in the number of ranking factors, 19 local formulas for the largest regions of Russia, dramatic changes in search results).
  • December 22, 2009. March 10, 2010. “Konakovo” (unofficial name, but later it will be Obninsk, its own formulas for 1250 cities throughout Russia), “Konakovo 1.1” (“Snezhinsk 1.1”) - updating the formula for geo-independent queries.
  • September 13, 2010. “Obninsk” (reconfiguring the formula, increasing productivity, new factors and ranking for geo-independent queries, the share of which in the flow is more than 70%).
  • December 15, 2010. “Krasnodar” (Spectrum technology and increasing the variety of search results, decomposing the user’s request into intents), further: increasing the localization of search results for geo-dependent queries, independent formulas for 1250 cities in Russia.
  • August 17, 2011. “Reykjavik” (taking into account the language preferences of users, the first step in personalizing the results).
  • December 12, 2012. “Kaliningrad” (significant personalization of search results: hints, taking into account the user’s long-term interests, increasing relevance for “favorite” sites).
  • May 30, 2013. “Dublin” (further personalization of search results: taking into account the immediate interests of users, adjusting search results to the user directly during the search session).
  • March 12, 2014. “Nachalovo”*, “No links” (cancellation of taking into account links / a number of link factors in the ranking for groups of commercial queries in the Moscow region).
  • June 5, 2014. “Odessa”*, “Islands” (new “island” design of delivery and services, introduction of interactive responses, later the experiment was considered unsuccessful and completed).
  • April 1, 2015. “Amsterdam”*, “Objective response” (additional card with general information about the subject of the request to the right of the search results, Yandex classified and stored tens of millions in the database various objects search).
  • May 15, 2015. “Minusinsk” (downgrading in the ranking of sites with an excessive number and share of SEO links in the link profile, mass removal of SEO links, further return to taking into account link factors in ranking for all queries in the Moscow region).
  • September 14, 2015(± 3 months). “Kirov”*, “Multi-Armed Bandits of Yandex” (randomized addition to the numerical value of the relevance of a number of documents with a rating of “Rel+”, in order to collect additional behavioral information in the Moscow region, later randomization was introduced in the regions of Russia).
  • February 2, 2016. “Vladivostok” (taking into account the site’s adaptability to viewing from portable devices, increasing the results of mobile issuance of adapted projects).
    * - unofficial names of algorithms, cities are selected at the discretion of the author in order to maintain order.

And ending with this moment(we are developing) the algorithms are the above-described algorithms with the epic and extraordinary names “Palekh” and “Baden-Baden”.

All Yandex filters and their types.

Yandex has many filters that can be applied both to the site as a whole and to its individual pages in particular. Unfortunately, it is not always clear which of the many filters is applied to the site and for what violations - now any slightest discrepancy when using standard promotion methods can be recognized as “overspam”. Result: pessimization.

All Yandex filters (depending on their appearance) can be divided into 3 types:

Pre-filters: discount the value of any factors even before the relevance of the site is calculated. The effect of pre-filters may not be immediately noticeable - usually it manifests itself in the site “sticking” in some places (the site has reached page 2 and does not move further, despite building up the link mass, for example).
Post filters: they reset the value of one or another factor after the site’s relevance has been calculated. It is difficult not to notice this type of filters - they are the ones that manifest themselves in a sharp drop in positions and traffic from Yandex. Almost all filters can be classified as post-filters for boosting internal factors.
Filtering before issuing: this is when the relevance of a site is calculated, but for some reason it is not allowed to appear in the search results.
Ban: It’s rare, but it still happens that a site is completely excluded from the search results for gross violations of the search license.

Apparently, Yandex is quite demanding on the quality of sites and, at every appropriate occasion, reminds us of its official position - develop your site, focus on the “live” user, and if the site’s rating from Yandex is “excellent”, your site will not be ignored. Optimize your site so that optimization does not harm, but, on the contrary, helps users navigate your site.

Today we announced a new search algorithm “Palekh”. It includes all the improvements we've been working on lately.

For example, search now uses neural networks for the first time to find documents not by the words that are used in the query and in the document itself, but by the meaning of the query and the title.

For many decades, researchers have been grappling with the problem of semantic search, in which documents are ranked based on semantic relevance to a query. And now it's becoming a reality.

In this post I will try to talk a little about how we did it and why it is not just another machine learning algorithm, but important step to the future.

Artificial intelligence or machine learning?

Almost everyone knows that modern search engines work using machine learning. Why should we talk separately about the use of neural networks for its tasks? And why only now, since the hype around this topic has not subsided for several years? I'll try to tell you about the history of the issue.

Internet search is a complex system that appeared a long time ago. At first it was just a search for pages, then it turned into a problem solver, and now it is becoming a full-fledged assistant. The larger the Internet, and the more people there are, the higher their demands, the more difficult the search has to become.

The era of naive search

At first there was just a word search - an inverted index. Then there were too many pages, they needed to be ranked. Various complications began to be taken into account - word frequency, tf-idf.

The Age of Links

Then there were too many pages on any topic, an important breakthrough occurred - they began to take into account links, PageRank appeared.

The Age of Machine Learning

The Internet became commercially important, and many scammers emerged trying to fool the simple algorithms that existed at the time. A second important breakthrough occurred - search engines began to use their knowledge of user behavior to understand which pages are good and which are not.

Somewhere at this stage, the human mind was no longer sufficient to figure out how to rank documents. Happened next transition- search engines began to actively use machine learning.

One of the best machine learning algorithms was invented in Yandex - Matrixnet. We can say that ranking is helped by the collective intelligence of users and the “wisdom of the crowd”. Information about sites and people's behavior is converted into many factors, each of which is used by Matrixnet to build a ranking formula. In fact, the ranking formula is written by a machine (it turned out to be about 300 megabytes).

But “classical” machine learning has a limit: it only works where there is a lot of data. A small example. Millions of users enter the query [VKontakte] to find the same site. IN in this case their behavior is such a strong signal that the search does not force people to look at the results, but suggests the address immediately when entering the query.

But people are more complex, and they want more and more from their search. Now up to 40% of all requests are unique, that is, they are not repeated at least twice during the entire observation period. This means that the search does not have sufficient data on user behavior, and Matrixnet is deprived of valuable factors. Such queries in Yandex are called “long tail”, since together they make up a significant proportion of hits to our search.

The Age of Artificial Intelligence

And now it’s time to talk about the latest breakthrough: a few years ago, computers became fast enough, and there was enough data to use neural networks. The technologies based on them are also called machine intelligence or artificial intelligence - because neural networks are built in the image of neurons in our brain and try to emulate the work of some of its parts.

Machine intelligence is much better than older methods at tasks that humans can do, such as speech recognition or pattern recognition in images. But how does this help the search?

As a rule, low-frequency and unique queries are quite difficult to search for and it is much more difficult to find a good answer for them. How to do it? We have no hints from users (which document is better and which is worse), so to solve the search problem we need to learn to better understand the semantic correspondence between two texts: the query and the document.

It's easy to say

Strictly speaking, artificial neural networks are one of the methods of machine learning. Quite recently it was dedicated to them. Neural networks show impressive results in the field of analyzing natural information - sound and images. This has been happening for several years now. But why haven’t they been used so actively in searches so far?

The simple answer is because talking about meaning is much more difficult than talking about an image in a picture, or how to turn sounds into decoded words. However, in the search for meaning, artificial intelligence has really begun to come from the area where it has long been king - image search.

A few words about how this works in image search. You take an image and use neural networks to transform it into a vector in N-dimensional space. Take the request (which can be either in text form or in the form of another picture) and do the same with it. And then you compare these vectors. The closer they are to each other, the more the picture matches the request.

Ok, if it works in images, why not apply the same logic to web search?

The devil is in technology

Let us formulate the problem as follows. We have a user request and a page title at the input. You need to understand how much they correspond to each other in meaning. To do this, it is necessary to represent the request text and the title text in the form of such vectors, the scalar multiplication of which would be greater, the more relevant the document with the given title is to the request. In other words, we want to train a neural network in such a way that for texts that are close in meaning it generates similar vectors, but for semantically unrelated queries and headings the vectors should be different.

The complexity of this task lies in selecting the correct architecture and method for training the neural network. Quite a few approaches to solving the problem are known from scientific publications. Probably the simplest method here is to represent texts as vectors using the word2vec algorithm (unfortunately, practical experience suggests that this is a rather poor solution for the problem at hand).

DSSM

In 2013, researchers from Microsoft Research described their approach, which was called the Deep Structured Semantic Model.

The model input is the texts of queries and headers. To reduce the size of the model, an operation is performed on them, which the authors call word hashing. Start and end markers are added to the text, after which it is divided into letter trigrams. For example, for the query [palekh] we will get the trigrams [pa, ale, lekh, ex]. Since the number of different trigrams is limited, we can represent the request text as a vector of several tens of thousands of elements in size (the size of our alphabet to the 3rd power). The elements of the vector corresponding to the trigrams of the request will be equal to 1, the rest - 0. In essence, we thus mark the entry of trigrams from the text into a dictionary consisting of all known trigrams. If you compare such vectors, you can only find out about the presence of identical trigrams in the request and header, which is not of particular interest. Therefore, now they need to be converted into other vectors, which will already have the semantic proximity properties we need.

After the input layer, as expected in deep architectures, there are several hidden layers for both the request and the header. The last layer is 128 elements in size and serves as a vector that is used for comparison. The output of the model is the result of scalar multiplication of the last header and request vectors (to be precise, the cosine of the angle between the vectors is calculated). The model is trained in such a way that for positive training examples the output value is large, and for negative ones it is small. In other words, by comparing the vectors of the last layer, we can calculate the prediction error and modify the model so that the error decreases.

We at Yandex are also actively researching models based on artificial neural networks, so we became interested in the DSSM model. Next we will talk about our experiments in this area.

Theory and practice

A characteristic property of the algorithms described in the scientific literature is that they do not always work out of the box. The fact is that an “academic” researcher and an industrial researcher are in significantly different conditions. As a starting point (baseline) from which the author scientific publication compares its solution, some well-known algorithm must be used - this ensures reproducibility of the results. The researchers take the results of a previously published approach and show how they can be surpassed. For example, the authors of the original DSSM compare their model using the NDCG metric with the BM25 and LSA algorithms. In the case of an applied researcher who deals with the quality of search in real search engine, the starting point is not one specific algorithm, but the entire ranking as a whole. The goal of the Yandex developer is not to overtake BM25, but to achieve improvement against the backdrop of the entire set of previously introduced factors and models. Thus, the baseline for a researcher in Yandex is extremely high, and many algorithms that have scientific novelty and show good results with an “academic” approach turn out to be useless in practice, since they do not really improve the quality of the search.

In the case of DSSM, we encountered the same problem. As often happens, in “combat” conditions the exact implementation of the model from the article showed rather modest results. A number of significant “file modifications” were required before we were able to obtain results that were interesting from a practical point of view. Here we will talk about the main modifications to the original model that allowed us to make it more powerful.

Large input layer

In the original DSSM model, the input layer is a set of letter trigrams. Its size is 30,000. The trigram approach has several advantages. Firstly, there are relatively few of them, so working with them does not require large resources. Secondly, their use makes it easier to identify typos and misspelled words. However, our experiments showed that representing texts as a “bag” of trigrams noticeably reduces the expressive power of the network. Therefore, we radically increased the size of the input layer, including, in addition to letter trigrams, about 2 million more words and phrases. Thus, we represent the query and header texts as a joint “bag” of words, word bigrams and letter trigrams.

Using a large input layer leads to an increase in model size, training time, and requires significantly more computing resources.

Difficult to learn: how a neural network fought with itself and learned from its mistakes

Training the original DSSM consists of exposing the network to a large number of positive and negative examples. These examples are taken from search results (apparently, they used Bing search engine). Positive examples are the titles of clicked search documents, negative examples are the titles of documents that were not clicked. This approach has certain disadvantages. The fact is that the absence of a click does not always indicate that the document is irrelevant. The opposite statement is also true - the presence of a click does not guarantee the relevance of the document. Essentially, by learning in the way described in the original article, we strive to predict the attractiveness of headlines, provided that they will be present in the search results. This, of course, is also not bad, but it has a rather indirect relation to our main goal - to learn to understand semantic proximity.

During our experiments, we discovered that the result could be significantly improved if we used a different strategy for selecting negative examples. To achieve our goal, good negative examples are those documents that are guaranteed to be irrelevant to the query, but at the same time help the neural network to better understand the meaning of words. Where can I get them from?

First try

First, let's just take the title of a random document as a negative example. For example, for the request [Palekh painting], a random title could be “Road Rules 2016 of the Russian Federation.” Of course, it is impossible to completely exclude the possibility that a document randomly selected from billions will be relevant to the request, but the probability of this is so small that it can be neglected. This way we can very easily get a large number of negative examples. It would seem that now we can teach our network exactly what we want - to distinguish good documents, which interest users, from documents that have nothing to do with the request. Unfortunately, the model trained on such examples turned out to be quite weak. A neural network is a smart thing, and will always find a way to simplify its work. In this case, she simply started looking for the same words in queries and headings: yes - a good pair, no - a bad one. But we can do this ourselves. It is important for us that the network learns to distinguish non-obvious patterns.

Another attempt

The next experiment was to add words from the query to the titles of negative examples. For example, for the request [Palekh painting] the random title looked like [Road Rules 2016 of the Russian Federation painting]. The neural network had a little more difficulty, but, nevertheless, it quickly learned to distinguish natural pairs well from those compiled manually. It became clear that we would not achieve success using such methods.

Success

Many obvious solutions become obvious only after they are discovered. It happened this time too: after some time it was discovered that The best way generating negative examples means forcing the network to “fight” against itself, learning from its own mistakes. Among hundreds of random headlines, we chose the one that the current neural network considered the best. But, since this header is still random, it is highly likely that it does not match the request. And these are the headlines that we began to use as negative examples. In other words, you can show the network the best random headers, train it, find new best random headers, show the network again, and so on. Repeating this procedure over and over again, we saw how the quality of the model noticeably improved, and more and more often the best of the random pairs became similar to real positive examples. The problem was solved.

Such a training scheme is usually called hard negative mining in the scientific literature. It should also be noted that similar solutions have become widespread in the scientific community for generating realistic-looking images; this class of models is called Generative Adversarial Networks.

Different goals

Researchers at Microsoft Research used document clicks as positive examples. However, as already mentioned, this is a rather unreliable signal about the semantic correspondence of the header to the request. After all, our task is not to raise search results the most visited sites, but to really find useful information. Therefore, we tried using other characteristics of user behavior as the training target. For example, one of the models predicted whether a user would stay on the site or leave. Another is how long he will stay on the site. As it turns out, you can significantly improve results if you optimize such a target metric, which indicates that the user found what he needed.

Profit

Ok, what does this give us in practice? Let's compare the behavior of our neural model and a simple one text factor, based on the correspondence between the query words and the text - BM25. It came to us from those times when ranking was simple, and now it is convenient to use it as a basic level.

Let's take the query [Book of Kells] as an example and see what significance the factors take on different headings. To control this, let’s add a clearly irrelevant result to the list of headers.

All factors in Yandex are normalized to the interval. It is expected that BM25 has high values ​​for titles that contain query words. And it is quite predictable that this factor receives a zero value on titles that do not have words in common with the request. Now notice how the neural model behaves. It equally well recognizes the connection between a request both with the Russian-language title of a relevant page from Wikipedia, and with the title of an article on English language! In addition, it seems that the model “saw” the connection between the query and the title, which does not mention the Book of Kells, but contains a similar phrase (“Irish gospels”). The value of the model for an irrelevant title is significantly lower.

Now let's see how our factors will behave if we reformulate the request without changing its meaning: [gospel of Kells].

For BM25, the reformulation of the query turned into a real disaster - the factor became zero on relevant headings. And our model demonstrates excellent resistance to reformulation: relevant headlines still have a high factor value, while an irrelevant headline still has a low factor value. It seems that this is exactly the behavior we expected from a thing that claims to be able to “understand” the semantics of a text.

Another example. Request [a story in which a butterfly was crushed].

As we can see, the neural model was able to highly evaluate the title with the correct answer, despite complete absence common words with the request. Moreover, it is clearly visible that headings that do not answer the query, but are still related in meaning, receive a fairly high factor value. It’s as if our model “read” Bradbury’s story and “knows” that this is exactly what he’s talking about in the request!

What's next?

We are at the very beginning of a long and very interesting journey. Apparently, neural networks have great potential for improving rankings. The main directions that need active development are already clear.

For example, it is obvious that the title contains incomplete information about the document, and it would be good to learn how to build a model based on the full text (as it turned out, this is not entirely trivial task). Further, we can imagine models that have a significantly more complex architecture than DSSM - there is reason to believe that in this way we will be able to better handle some natural language constructs. We see our long-term goal as creating models that can “understand” the semantic correspondence between queries and documents at a level comparable to that of a human. There will be many difficulties on the way to this goal - the more interesting it will be to go through it. We promise to talk about our work in this area. Follow the next publications.

We all know firsthand about the existing algorithms of the search engines Yandex and Google. It is to comply with their “constantly updated” rules that all optimizers are racking their brains with more and more new ways to get to the TOP of search results. Among the latest innovations that site owners have felt on the part of the PS are the requirements for the mobility of Internet resources and a decrease in the search for those sites who don't know how to buy links. What algorithms, introduced into search so far, have significantly influenced the ranking of sites? In fact, not all optimizers know what technologies, when and why were created in order to give the most fair position to each site in the search and clear the search results of “junk”. We will look at the history of the creation and development of search algorithms in this article.

Yandex: types of algorithms from conception to today

The algorithms were not all created in one day, and each of them went through many stages of refinement and transformation. The bulk of the names of Yandex algorithms consist of city names. Each of them has its own operating principles, points of interaction and unique functional features, harmoniously complementing each other. We will consider further what algorithms Yandex has and how they affect sites.

In addition to information about search algorithms, an article about . I suggest you read tips on creating high-quality SEO content suitable for Google and Yandex search engines.

Magadan

The Magadan algorithm recognizes abbreviations and identifies nouns with verbs. It was first launched in test mode in April 2008, and the second permanent version was released in May of the same year.

Peculiarities

"Magadan" provides the user who wrote the abbreviation with websites and transcripts. For example, if you entered the request for the Ministry of Internal Affairs into the search bar, then in addition to sites with such a keyword, the list will also contain those that do not have an abbreviation, but have the decoding “Ministry of Internal Affairs”. Transliteration recognition gave users the opportunity not to think in what language to write names correctly, for example, Mercedes or Mercedes. In addition to all this, Yandex included almost a billion foreign sites in the indexing list. Recognition of parts of speech and recognition of them as equivalent search queries allowed sites with different key phrases to be included in one search. That is, now, for the keyword “website optimization”, sites with the phrase “optimize website” are also displayed in the search results.

results

After the launch of the Magadan algorithm, it became more difficult, mainly for low-authority sites. In the ranking, the positions for relevant queries of low-visited and young resources decreased, and authoritative ones, even with low-quality content, moved to the first places, taking into account the morphology and dilution of keywords. Due to the inclusion of transliteration, foreign resources also entered the TOP of the Runet. That is, optimized text on a topic could appear on the second page, only because, supposedly, there is a more visited site on the same topic or a similar foreign one. Because of this, competition for low-frequency keywords and foreign phrases has increased sharply. Advertising has also become more expensive - the rates have increased, because previously sites competed only on one specific request, and now they also compete with “colleagues” with morphological phrases, transliteration, words that change into another part of speech.

Nakhodka

The “Nakhodka” algorithm is an expanded thesaurus and careful attention to stop words. Released into the ring immediately after Magadan. Ranks the main search results since September 2008.

Peculiarities

This is an innovative approach to machine learning - ranking has become clearer and more correct. The expanded dictionary of connections and attentiveness to stop words in the Nakhodka algorithm greatly influenced the search results. For example, the request “ SEO optimization" was now associated with the keyword "SEO optimization", and commercial sites were diluted information portals, including expanded snippets with answers appeared in the list, and Wikipedia was displayed in a special way.

results

Commercial sites have placed greater emphasis on sales queries, as competition has increased several times for informational, non-specific phrases. In turn, information platforms were able to expand their monetization using recommendation pages by participating in affiliate programs. Top information sites, promoted by commercial requests, began to sell links to order. Competition has become tougher.

Arzamas

Algorithm "Arzamas" - lexical statistics of search queries was introduced and a geographical reference of the site was created. The first version of "Arzamas" (April 2009) without geo-dependence was released immediately into the main search results, and "Arzamas 2" with a classifier for linking the site to the region was announced in August 2009.

Peculiarities

Removing the link to homonyms made life easier for the user, because now the phrase “American pie” returned only movie-themed sites, without any dessert recipes, as could have been the case before. Linking to the region made a breakthrough, shifting key phrases with the addition of the city several points down. Now the user could simply enter the word “restaurants” and see in the leaders only sites from the city of his location. If you remember, earlier you would have had to enter a more specific phrase, for example, “Restaurants in St. Petersburg,” otherwise Yandex could have returned the response “specify the request - too many options were found.” Geo-independent keywords returned only sites relevant to the request from any region, without reference.

results

Hooray! Finally, sites from small regions have stopped competing with large cities. It is now much easier to reach the TOP in your region. It was during this period of time that the “regional promotion” service was offered. The Armazas algorithm made it possible for small companies to develop faster in their area, but the catch still remained. Yandex could not determine the geolocation of all sites. And as you yourself understand, without attachment, the resources remained, to put it mildly, in one not very pleasant place. Consideration of an application for geo-dependence could last several months, and young sites without traffic and link mass (there was a restriction on TICs) generally could not submit a request to assign them a regionality. It's a double-edged sword.

Snezhinsk

The Snezhinsk algorithm strengthens geodependence and clarifies the relevance of queries to search results using Matrixnet machine learning technology. The announcement took place in November 2009, and the improved model under the name “Konakovo” went into operation in December of the same year.

Peculiarities

Search results have become more accurate to the questions entered. Geolocation binding now plays a special role - commercial sites were not associated with regions by the Snezhinsk algorithm, so they dropped out of the search results. Keywords that are not tied to a location are identified with information resources. The complex architecture for calculating relevance greatly complicated the life of optimizers, who noticed that with the slightest change in one of the indicators, the site’s position in the search results instantly changed.

results

At that time, it was noted that the purchase of external links to young sites affected the performance of new resources too sluggishly, if we compare a similar purchase to a site that has been on the Internet market for a long time. New methods for determining the relevance of content to search queries removed sites whose texts were oversaturated with key phrases from the search results. A new era has begun quality text, where there had to be a measure in everything; without it, the site could simply fall under sanctions for spam. Commercial resources began to panic, because it was almost impossible to reach the TOP using geo-independent keywords (and they were the highest-frequency ones). In this regard, an entry was published on the Yandex blog that ideally we would like to see on the first pages commercial organizations that do not write beautifully, but do their job well, but for this we will have to teach algorithms to evaluate the quality of the services offered. Since this turned out to be an impossible task at the moment, the reputation of commercial Internet resources played a key role in search results, both online and offline.

Obninsk

The “Obninsk” algorithm improves ranking and expands the geographic base of Internet sites and reduces the impact of artificial SEO links on site performance. Launched in September 2010.

Peculiarities

The popularity of purchasing is falling reference masses, the concept of a “link explosion” appears, which everyone was now afraid of. Competitors could harm each other by misleading the algorithm by purchasing a huge number of links from “bad sources” to their “colleague.” After this, the competitor dropped out of the search results and could not get there for a long time. Geo-sensitive words are more often added to different pages of commercial sites to draw the robot’s attention to working with this region.

results

Commercial sites are now more careful about their reputation, which is good news, but many still resorted to dirty methods (artificially inflating traffic and buying reviews). After the release of the Obninsk algorithm, the purchase of eternal links and articles became more popular; the usual purchase of links no longer influenced the ranking as much as before, and if the source of the backlink fell under sanctions, it could lead to a chain reaction. High-quality SEO texts are a mandatory attribute of any resource. A young site with unique and properly optimized content could get to the TOP.

Krasnodar

Algorithm "Krasnodar" - implementation of the "Spectrum" technology to dilute search results, expand snippets and indexing social networks. The launch took place in December 2010.

Peculiarities

The “Spectrum” technology was created to classify queries into categories and was used in cases where non-specific key phrases were entered. “Krasnodar” diluted the search results, offering such a user more diverse options. For example, with the phrase “photo of Moscow” in the search, one could see not only general landscapes, but also photographs by categories such as “attractions”, “maps”, “restaurants”. The emphasis was placed on unique names of something (sites, models, products) - the specifics began to stand out. Rich snippets made it possible to immediately show users contacts and other organization data in search results.

results

The ranking of commercial sites has changed significantly; special attention is paid to details (product cards, separation of the short description from the general one). The social network on VK has begun to be indexed and the profiles of participants are now equally visible directly in the search results. Posts in forums could rank first if they had a more extensive answer to the user's question than other sites.

Reykjavik

“Reykjavik” algorithm - personalization of search results has been created and “Wizards” technologies have been added to display preliminary results of the query. Improved input hint formula. The algorithm was launched in August 2011.

Peculiarities

The motto of the personalized search result is “Every user has his own results.” The system for remembering the interests of searchers worked through cookies, so if the user’s queries were more often related, for example, to foreign resources, next time they were displayed in the leaders of search results. Hints in the search bar are updated every hour, thereby expanding the possibilities of a specific search. Competition for high-frequency queries is increasing with incredible force.

results

Reputable news sites more often reach the TOP due to their expanded semantic core (the presence of a huge number of different low-frequency key queries). The increase in the number of pages for specific search queries on information sites began to play a major role after the release of the Reykvik algorithm. Each site tried to get into the user’s bookmarks in order to become part of the personalization system, for this they used subscription methods RSS feed, pop-up banners for bookmarking the site. Internet resources began to pay more attention to an individual approach, rather than putting pressure on the masses.

Kaliningrad

The “Kaliningrad” algorithm is a global personalization of search and search string, focusing on behavioral factors. The launch of Kaliningrad in December 2012 significantly increased the cost of SEO services.

Peculiarities

The interests of the user turned the entire search results upside down - site owners, who previously did not care about the comfort of the visitor’s stay on the site, began to lose traffic at lightning speed. Now Yandex divided its interests into short-term and long-term, updating its spy databases once a day. This meant that today and tomorrow, for the same request, the same user could be shown a completely different result. Interests now play a special role for a user who was previously interested in travel when typing in the phrase taxi - taxi services are shown, and for someone who constantly watches films - they will receive everything about the comedy film “Taxi” in the search results. In the search bar of every person “hungry to find information,” tips on previous interests are now displayed in the first positions.

results

Optimizers began to cover more and more ways to retain the user: usability and design improved, content was created more diverse and of higher quality. When exiting, windows like “Are you sure you want to leave the page” could pop up and the user would be stared at by the sad face of some creature. Well-thought-out page linking and always available menu improved user activity indicators, which increased the sites’ positions in search results. Sites that were unclear to a wide range of Internet users were first simply demoted in positions, and then generally hung at the end of the list of proposed results.

Dublin

Dublin Algorithm - Improved personalization by identifying current goals. This modernized version of “Kaliningrad” was released to the world in May 2013.

Peculiarities

The technology includes a function for tracking the changing interests of users. That is, if there are two completely different search views over a certain period of time, the algorithm will prefer the latter and include it in the search results.

results

For websites, practically nothing has changed. The struggle continues not just for traffic, but for improving behavioral indicators. Old website layouts are starting to be abandoned because it’s easier to make a new one than to try to fix something on the old one. The supply of website template services is increasing, and competition for convenient and beautiful web resource layouts is beginning.

Islands

“Islands” algorithm - technology has been introduced to display interactive blocks in search results, allowing the user to interact with the site directly on the Yandex search page. The algorithm was launched in July 2013, with a proposal to webmasters to actively support the beta version and use templates for creating interactive “islands”. The technology is currently being tested behind closed doors.

Peculiarities

Now, when searching for information that can be found out immediately from the search, the user is offered “islands” - forms and other elements that can be worked with without visiting the site. For example, you are looking for a specific movie or restaurant. For the film in the search and to the right of it, blocks will be displayed with the cover of the film, its title, cast, showtimes in cinemas in your city and a form for purchasing tickets. The restaurant will display its photo, address, telephone numbers, and table reservation form.

results

Nothing significant changed in the ranking of sites at first. The only thing that has become noticeable is the appearance of web resources with interactive blocks in the first place and to the right of the search results. If the number of sites that took part in beta testing was significant, they could displace regular sites due to their attractiveness and catchiness for users. SEOs are thinking about improving the visibility of their content in search results by adding more photos, videos, ratings and reviews. Life is better for online stores - correctly configured product cards can be an excellent interactive “island”.

Minusinsk

The “Minusinsk” algorithm - when identifying SEO links as such, which were purchased to distort search ranking results, a filter was applied to the site, which significantly spoiled the site’s position. “Minusinsk” was announced in April 2015, and fully came into force in May of the same year. It is with this algorithm that the famous one is associated.

Peculiarities

Before the release of Minusinsk, in 2014, Yandex disabled the influence of SEO links for many commercial keys in Moscow for testing and analyzed the results. The result turned out to be predictable - purchased link mass is still used, and for search engine- this is spam. The release of “Minusinsk” marked the day when site owners had to clean up their link profiles, and use the budget spent on link promotion to improve the quality of their Internet resource.

results

“Reputable” sites that achieved TOP thanks to the bulk purchase of links flew off the first pages, and some received sanctions for violating the rules. High-quality and young sites that do not rely on backlinks suddenly found themselves in the TOP 10. “Caught in the distribution” websites that did not want to wait long created new sites, transferring content and putting a plug on the old ones, or cunningly shamanized with redirects. After about 3 months, we found a hole in the algorithm that allows us to remove this filter almost instantly.

Usability and content are beginning to be improved en masse. Links are purchased with even greater care, and control over backlinks becomes one of the functional responsibilities of the optimizer.

According to today’s data, if you purchase links ineptly, you can get a filter even for 100 links. But if the link mass is properly diluted, then you can safely buy thousands of links just like in the good old days. That is, in essence, link budgets for this very dilution, which was played by crowding and mentions, have grown significantly.

Vladivostok

The “Vladivostok” algorithm is the introduction into search technology of checking a site for full compatibility with mobile devices. The full start of the project occurred in February 2016.

Peculiarities

Yandex has taken another step towards mobile users. The Vladivostok algorithm was developed especially for them. Now, in order to rank better in mobile searches, a website must meet mobile-friendly requirements. To get ahead of your competitors in search results, an Internet resource must be displayed correctly on any web device, including tablets and smartphones. "Vladivostok" checks for the absence of java and flash plugins, adaptability of content to screen expansion (text capacity across the width of the display), ease of reading text and the ability to comfortably click on links and buttons.

results

By the time the Vladivostok algorithm was launched, only 18% of sites turned out to be mobile-friendly - the rest had to quickly get rid of the “heaviness” on the pages that was not displayed or prevented the content from being displayed correctly on smartphones and tablets. The main factor that influences a website's ranking in mobile search results is the behavior of the mobile user. At least for now. After all, there are not so many perfectly mobile-friendly sites, so free places in the search are occupied by those who are able to provide the user with the most comfortable conditions, even if not completely. From mobile search not adapted to mobile devices sites are not thrown out, but are simply ranked lower than those who have achieved in improving the quality of service provision for smart users best results. At the moment, the most popular type of ordering website layouts is adaptive, not mobile, as one might think. Sites that pass all the requirements of the algorithm receive maximum amount mobile traffic in your niche.

Google: history of creation and development of algorithms

Google's algorithms and filters are still not entirely understood by Russian-speaking optimizers. For Google, it has always been important to hide details of ranking methods, explaining that “decent” sites have nothing to fear, and “dishonest” ones are better off not knowing what awaits them. Therefore, legends are still made about Google algorithms and a lot of information was obtained only after questions were asked to support when the site sagged in search results. Google had so many minor improvements that it was impossible to count, and when asked what exactly had changed, the foreign PS simply remained silent. Let's consider the main algorithms that significantly influenced the positions of sites.

Caffeine

Algorithm “Caffeine” - on the first page of the search there can be several pages of the same site by brand, and there is a preview option. The launch took place in June 2010.

Peculiarities

Highlighting company websites when searching by brand. A “magnifying glass” appears near the output line for preview. Brand keywords provide a positive growth trend in the positions of the Internet resource as a whole. Index updated Page Rank, while PR increased on well-known and visited sites.

results

SEOs have begun to pay more attention to website branding, including color schemes, logos, and names. Keywords for the brand made the site pages stand out in a special way in the search, and when a visitor switched from such a phrase to the main page, his position in the search results grew (if before that the resource was not a leader). SEO optimizers began to purchase more links to increase citations. It was almost impossible for young and little-recognized brands to break into the TOP search results.

Panda

The Panda algorithm is a technology for checking a website for the quality and usefulness of content, including many SEO factors. Sites with “black hat” SEO are excluded from searches. Panda was announced in January 2012.

Peculiarities

“Panda” went out to search and cleaned it of debris. This is exactly what can be said after many websites that were not relevant to key queries disappeared from Google results. The algorithm pays attention to: keyword spam and uneven use, uniqueness of content, consistency of publications and updates, user activity and interaction with the site. Having a visitor scroll to the bottom of a page at reading speed was considered a positive factor.

results

After turning on Panda, a huge number of sites succumbed to sanctions from the Google search engine and at first everyone thought that this was due to participation in link pyramids and the purchase of link masses. As a result, SEO optimizers conducted a process of testing the algorithm and analyzed the impact. The conclusion of the experiments was that Panda still checks the quality of the site for value for visitors. Internet resources stopped copy-pasting and actively began copywriting. Behavioral factors were improved by transforming the site structure into more convenient options, and linking within articles using special highlights became an important part of optimization. The popularity of SEO as a service has skyrocketed. It was noticed that sites that did not comply with the Panda rules disappeared from the search very quickly.

Page Layout (Paige Lyot)

The Page Lyot algorithm is a technology for combating search spam that calculates the ratio of useful to spam content on website pages. Launched in January 2012 and updated until 2014 inclusive.

Peculiarities

“Page Layout” was created after numerous user complaints about unscrupulous site owners whose pages had very little relevant content or the required data was difficult to access, and sometimes completely absent. The algorithm calculated the percentage of relevant content and spam on the page for an incoming request. Sanctions were imposed on sites that did not meet the requirements and the site was removed from the search. Non-compliance with the rules for posting documents also included a site header filled with advertising, when viewing the text required going to the second screen.

results

Sites that were too spammy with advertising fell from their positions, even though the content on the pages was moderately optimized for keywords. Pages that were not relevant to queries were demoted in search results. But there were not so many sites that blatantly did not follow the rules and did not worry about the comfort of visitors. After three updates to the algorithm, the approximate number of resources that fell under the filter turned out to be no more than 3%.

(Venice)

The “Venice” algorithm georeferences the site to a specific region, taking into account the presence of city names on the site pages. Launched in February 2012.

Peculiarities

“Venice” required webmasters to have an “About Us” page on their websites, indicating the location address, without paying attention to the fact that the company might not have an actual location. In context, the algorithm searched for city names in order to display a separate page for the region specified in it. Started using schema-creator.org markup to clarify search robot its geographical location.

results

Sites appeared in search results for those regions that they did not mention on their pages, not taking into account geo-independent queries. Optimizers actively include geo-sensitive keywords and try to create microdata. The content on each page is personalized for each specific city or region as a whole. Localized link building began to be actively used to increase positions in the selected region.

(Penguin)

Algorithm "Penguin" - smart technology determining the weight of sites and the quality of backlinks. A system for editing inflated indicators of the authority of Internet resources. Launched into search in April 2012.

Peculiarities

“Penguin” is aimed at the war against the purchase of backlinks, an unnatural, that is, artificial, set of site authority. The algorithm forms its base of significant resources based on the quality of backlinks. The motivation for launching Penguin was the emergence of link optimizers, when any link to a web resource had equal weight and raised such a site in search results. In addition, ordinary profiles of social network users began to be ranked in search on a par with standard Internet resources, which further popularized the promotion of ordinary sites using social signals. Simultaneously with these algorithm capabilities, the system began to combat irrelevant insertions of search queries into keywords and domain names.

results

Penguin “let down” many sites in search results for the unnatural growth of backlinks and the irrelevance of content to user requests. The importance of catalogs and sites for selling links quickly decreased to a minimum, while authoritative resources (news sites, thematic and near-thematic sites) grew before our eyes. Due to the introduction of the Penguin algorithm, PR for almost all public sites was recalculated. The popularity of mass purchasing of backlinks has dropped sharply. Websites began to tailor key phrases to the content on site pages as much as possible. The “relevance mania” has begun. Installation social buttons on pages in the form of modules was widespread due to the rapid indexing of social network accounts in search.

Pirate

The “Pirate” algorithm is a technology for responding to user complaints and identifying cases of copyright infringement. The system was launched in August 2012.

Peculiarities

“Pirate” accepted complaints from authors about violation of their copyrights by site owners. In addition to texts and pictures, sites with video content that hosted pirated footage of films from cinemas took the brunt of the attack. Descriptions and reviews of the videos were also subject to filtering - now it was not allowed to copy-paste under pain of sanctions. Due to a large number of complaints against the site for violations, such a site was thrown out of the search results.

results

Based on the results of the first month of operation of Google's Pirate, millions of video files that violated the rights of copyright holders were blocked from viewing on almost all sites, including video hosting sites and online cinemas. Websites with only pirated content were sanctioned and dropped from searches. The massive cleanup of “stolen” content is still ongoing.

HummingBird

The “Hummingbird” algorithm is the introduction of technology for understanding the user when queries do not match exact entries. The system for “identifying exact desires” was launched in September 2013.

Peculiarities

Now the user did not change the phrase to find more specifically necessary information. The “Hummingbird” algorithm made it possible not to search by direct exact occurrences, but returned results from the “deciphering wishes” database. For example, the user typed in search bar the phrase “places for recreation,” and “Kolibri” ranked sites with data on sanatoriums, hotels, spa centers, swimming pools, and clubs in the search. That is, the algorithm grouped a standard database with human phrases about their description. The understanding system has changed the search results significantly.

results

With the help of the Hummingbird technology, SEO optimizers were able to expand their semantic core and get more users to the site due to morphological keys. The ranking of sites has been clarified, because now not only occurrences of direct key phrases and text-relevant queries are taken into account, but also the topical wishes of users. The concept of LSI copywriting appeared - writing text that takes into account latent semantic indexing. That is, now articles were written not only with the insertion of keywords, but also including synonyms and related phrases as much as possible.

(Pigeon)

The “Dove” algorithm is a system for localizing users and linking search results to their location. The technology was launched in July 2014.

Peculiarities

The user's location now played a key role in delivering results. Organic search has become all about geolocation. Linking sites to Google maps played a special role. Now, when a user requests, the algorithm first searches for sites closest in location or targeted content, then moves away from the visitor. Organic search results have changed significantly.

results

Local sites quickly rose in search rankings and received local traffic. Internet platforms without geo-dependence fell in positions. The struggle for each city began again and the number of situations increased when identical sites with redacted content and links to different areas began to appear. Before receiving accurate information about the implementation of the “Dove” algorithm in Russian-language Internet search, many webmasters thought that they were under Penguin sanctions.

(Mobile Friendly)

The Mobile-Friendly algorithm is the implementation of technology for checking sites for adaptability to mobile devices. The system was launched in April 2015 and managed to be “called” on the Internet as: “Mobile Armageddon” (mobilegeddon), “Mobile Apocalypse” (mobilepocalyse, mobocalypse, mopocalypse).

Peculiarities

Mobile-Friendly has launched a new era for mobile users, recommending that optimizers urgently ensure a comfortable stay for mobile visitors on their sites. The adaptability of sites to mobile devices has become one of the the most important indicators concerns of site owners about their visitors. Non-responsive web platforms had to quickly correct shortcomings: get rid of plugins that are not supported on tablets and smartphones, adjust the text size to suit the expansion of different screens, remove modules that prevent visitors with a small screen from moving around the site. Someone just created a separate one mobile version your Internet resource.

results

Resources that were prepared in advance for such a turn received special emphasis among other Internet sites in search results, and traffic from a variety of non-desktop devices to such websites increased by more than 25%. Completely non-responsive sites were demoted in mobile search. The focus on mobility played a role - the presence of heavy scripts on resources was minimized, advertising and pages naturally began to load faster, given that most users with tablets/smartphones use mobile Internet, which is several times slower than the standard one.

Summary

That's all

Now you know how search has developed over the years, both for ordinary users and for “hit-and-miss” sites. Each of the above search algorithms is periodically updated. But this does not mean that optimizers and webmasters should be afraid of something (unless, of course, you are using black hat SEO), but it is still worth keeping an eye out so as not to unexpectedly sag in the search due to the next new filter.







2024 gtavrl.ru.