Removing pages from the search engine index. Why are pages excluded from search?


Regarding the unloading of indexed pages, Yandex finally did what was needed.

And now we have received very good tool with which you can get very interesting information.

Today I will tell you about this information and you can use it to promote your sites.

Go to Yandex.Webmaster, to the section "Indexing"

And here is this picture in front of you (click to enlarge):

This excluded page data gives us a lot of information.

Well, let's start with the Redirect:

Usually, a redirect does not pose any problems. This is the technical component of the site.

This is a regular page duplicate. I would not say that this is not so critical. It’s just that Yandex of two pages considered the second page more relevant.

And Yandex even wrote his comment: The page duplicates the page http://site/?p=390 already presented in the search. Provide the robot with your preferred address using a 301 redirect or the rel="canonical" attribute.

This can be used in the following way: Sometimes the pages of the site that you promote fly out and, on the contrary, their duplicates appear in the index. IN in this case you just need to register the canonical URL on both of these pages to the one you are promoting.

After that, add both of these pages to the “Robot Re-Crawl”.

This is the page in whose meta tags the canonical URL to the desired page is written.

Everything is fine here and this is the normal process of the site.

Yandex also writes a hint here: The page is indexed at the canonical address http://site/?p=1705, which was specified in the rel="canonical" attribute in source code. Correct or remove the canonical attribute if it is incorrect. The robot will track changes automatically.

This usually happens when you deleted some pages but did not install a redirect. Or the 404 error was not set.

This does not cause any harm to website promotion.

Well, this comes to the most interesting part. Insufficient quality page.

Those. the pages of our website were removed from the Yandex index because they were not of sufficient quality.

Of course, this is the most important signal to your site that you have global problems with these pages.

But not everything is as clear as it seems.

Often these are pages of pagination, search, or other garbage. And these pages are correctly excluded from the index.

But sometimes it happens that online store product cards are excluded from the index. And they exclude thousands. And this certainly indicates that there are some serious problems with your product card pages.

I've looked at many online stores this week and almost all of them have something similar. Moreover, page departures are observed in the tens of thousands.

There may be such problems that we have several identical pages where the product is simply different color. And Yandex believes that this is one page.

In this case, it’s either to make one page with a color choice on one page, or to modify other pages.

But of course it’s worth saying that this is a GLOBAL help for all online store owners. They made it clear to you which pages flew away and why.

Here you need to work on the quality of these pages. Maybe these pages duplicate others, maybe not.

Sometimes there is simply no text on such pages. And some have no price and Yandex removes such pages from the index.

I also noticed that if the product card page has the status “Product out of stock,” then such a page is also removed from the Yandex index.

In general, work.

I’ll tell you about other interesting features on Monday at my seminar -

Yes and more. Many people know this problem with Yandex.Advisor:

Those. you paid for a click from Direct, and Yandex.Advisor takes your paid client to Yandex.Market.

This is truly an outrageous case.

As I understand it, Yandex will not change anything.

Okay, then I'll change it myself.

This Yandex.Advisor primarily concerns online stores. And online stores are primarily based on engines: Bitrix, Joomla, Webasyst.

So for these engines I am writing an advisor blocker. Those. When you install this plugin on your engine, Yandex.Advisor will not work on your site.

I will later send these plugins to everyone who comes to my seminar for free.

I chose the most popular engines on which IM is installed. This is not necessary for service sites. But for IM, that’s it.

If you have questions, ask questions.

Link "Saved Copy" on the search results page sometimes allows you to find out very interesting things that were promptly removed from a site. This happens with news or some controversial publications. This is a feature of the job search engines makes site visitors very happy. But now, acting as administrators, we, on the contrary, are interested in ensuring that unnecessary pages that we have already removed from our site are removed from search results as soon as possible. Again, sooner or later this will happen. To make this happen early, the form " Remove URL" (rice. 4.3):

enlarge imageRice. 4.3. URL removal form

Of course, this form does not guarantee instant removal of the result, but it speeds it up.

Check site

Is our site even in the search database? The answer to this question will be given by the page " Check site" (rice. 4.4):

enlarge imageRice. 4.4. Site check

Opens regular page search, in which the template rhost="ru.narod.v-rn"|rhost="ru.narod.v-rn.*" ( rice. 4.5):

enlarge imageRice. 4.5. Site test results

We are already familiar with this template from the first Lecture. Now our website, which was just created, is not found in the search database. Let's check the Internet University of Information Technologies website using the same form. The query string will contain the pattern rhost="ru.intuit"|rhost="ru.intuit.*" ( rice. 4.6):

enlarge imageRice. 4.6. The result of checking the INTUIT website

The site is also located in Yandex.Catalogue, so the results page displays the corresponding heading. After some time, we will once again enter the values ​​of the site v-rn.narod.ru. The site has been indexed and is now displayed in the database ( rice. 4.7):

enlarge imageRice. 4.7. Checking the site after indexing

Note that in this case, the contents of the title tag are displayed as a description of the site. Therefore, on many sites, the headings are very long, often containing a description of the site.

My sites

Chapter "My sites" contains a list of resources that can be managed by you. This is a kind of starting point for monitoring your sites. At first the list is empty, so we enter the site address and click on the button "Add"(rice. 4.8):

enlarge imageRice. 4.8. Adding an address in the "My Sites" section

Some time later, after the site is indexed, the report will contain information about this ( rice. 4.9):

enlarge imageRice. 4.9. Indexed site in the "My Sites" section

Looking at this list, the question arises - can I add a completely foreign site in this way? A system for checking control rights has been created specifically to prevent such cases. It involves asking you to upload certain information to your website. If you manage to do this, then Yandex will consider you the owner of the site. In other words, the site owner can change its content, which Yandex will check.

In our case, we are authorized on behalf of the user [email protected], which is automatically the owner of the site v-rn.narod.ru. Therefore, no proposals to check control rights arise - we automatically receive the status of confirmed rights.

Site errors refer to cases where the search bot could not access certain pages. These can be banal non-existent pages (404 errors) or links to protected parts of the site that are prohibited for indexing (see below file "robots.txt").

Field "Pages Loaded" displays the total number of site pages that the Yandex search bot managed to crawl.

enlarge imageRice. 4.10. Information about v-rn.narod.ru

Hyperlink "Site structure" leads to the structure, which, again, Yandex sees. Only those subsections that contain more than 10 pages and occupy more than 1% of the total are displayed here. Therefore, the actual structure - the one we could see on the local computer or on FTP - will be different from the one presented.

Sitemaps- an excellent way to specify the priority of page indexing for sites whose content is frequently updated. For example, most news feeds contain subsections where news is frequently posted. Other subsections - archive, information about the site (or company), mailing list - are updated less frequently. Fast indexing of the most updated materials will allow you to display current data in search results, which means it will help attract new visitors to the resource. Sitemaps are created using XML markup, the specific specifications of which are provided at official website.

Chapter "Index" contains very interesting information and tools ( rice. 4.11):

enlarge imageRice. 4.11. Section "Index"

At the end of August 2007, Yandex stopped supporting such features as searching for pages containing a link to this one, searching for words contained only in the texts of links to this one. They were convenient for use by both site owners and ""robots" - programs written to study Yandex rankings and try to manipulate them". Since such manipulation deteriorated the quality of the search, the corresponding tools were turned off. So, and the section "Index" allows you to still see which external pages link to a subsection of your site using the tool "External links". In other words, to some extent the previous functionality is available in this service.

Chapter "Requests" allows you to see what search words the site appears in search results ( rice. 4.12):

enlarge imageRice. 4.12."Requests" section

Another tool for webmasters, Yandex.Metrica, which we will look at later, has tools that duplicate this section.

Chapter "Tools" contains, as the name suggests, tools for checking the file robots.txt and changing the case of the site name in search results ( rice. 4.13):

enlarge imageRice. 4.13. Section "Tools"

As a rule, any website has sections that should not be indexed by search engines. These are administrative folders, personal data of users, work materials. In order to inform all search engines and Yandex, in particular, that some sections do not need to be bypassed, the premises of the usual text file "robots.txt" to the root directory of the site, and the file name should be exactly like that. For example, on a real website this file is located here: http://www.intuit.ru/robots.txt We see its contents:

Disallow: /cgi-bin/

Disallow: /w2k-bin/

Disallow: /admin/

Disallow: /w2admin/

Disallow: /user/

Disallow: /diploma/

Language of this file pretty simple- this is not some kind of programming or even HTML code. In this case, indexing of the cgi-bin, w2k-bin, admin, etc. directories is prohibited for all search engines - "User-Agent: *". with all their contents. Let's create our own file robots.txt for our site. For example, let's disable indexing of a specific folder. Go to the Workshop and click on the link "Create a folder"(rice. 4.14):

enlarge imageRice. 4.14. Workshop, "Create Folder" link

The name of the folder can be completely arbitrary, but adhering to certain traditions, let's call it admin(rice. 4.15):

enlarge imageRice. 4.15. Creating the "admin" folder

enlarge imageRice. 4.16. Workshop, hyperlink "Create html file"

Enter an arbitrary file name, say, main.html ( rice. 4.17):

enlarge imageRice. 4.17. Creating an html page

And then we move on to editing the created page. Let's write that this is a page that should not be available to search engines ( rice. 4.18):

enlarge imageRice. 4.18. Editing an html page

Of course, all the steps we have now taken are just using the Workshop functionality. We would get exactly the same result if we created a folder, then a page in Dreamweaver, and then uploaded them via ftp. The main thing is that now at the address http://v-rn.narod.ru/admin/main.html we see the created page ( rice. 4.19):

enlarge imageRice. 4.19. A page that will be hidden from search engines

Important note - this page is completely accessible to all users. She will remain the same in the future. If we want authorized users to have access to it, then this problem will have to be solved using web programming. But this has nothing to do with the ban on indexing by search engines.

Now launch notepad and save the file "robots.txt" and introduce the rule into it ( rice. 4.20):

Rice. 4.20. Rule for the site v-rn.narod.ru

All that remains is to drop this file into the root folder of the site. Go to the Workshop, click on the link "download files"(rice. 4.21):

enlarge imageRice. 4.21. Workshop, hyperlink "Upload files"

enlarge image

All is ready. We return to the Yandex.Webmaster tools, follow the link "Analysis of the robots.txt file"(cm. rice. 4.13). This will download the contents of the file. robots.txt, which is already on our website. It is displayed in the top field of the page. We enter the address we want to check - v-rn.narod.ru/admin/main.html(rice. 4.23):

enlarge imageRice. 4.23. Analysis of the robots.txt file

As we wanted, this page will not be indexed - the result is “forbidden by the /admin/ rule” ( rice. 4.24):

Rice. 4.24. The result of analyzing the robots.txt file

To analyze a file robots.txt all sites - not just your own, it is used public version of the tool. It works even for unauthorized users.

Tool "Site name register" users will undoubtedly appreciate who love to write email address as [email protected]. It allows you to change the site address in search results ( rice. 4.25):

enlarge imageRice. 4.25. Changing the case of the site name

Of course, after the change, the site will still be accessible via a regular link like v-rn.narod.ru.

How to temporarily remove your pages from results Google search

This tool allows you to temporarily block your website's pages from Google search results. Read about how to remove pages from Google Search that do not belong to you.

Important Notes

How to temporarily exclude a page from Google search results

  1. The URL must be relative to a property you own in Search Console. If this is not the case, you need to follow other instructions.
  2. Switch to URL removal tool.
  3. Click Temporarily hide.
  4. Specify relative path to the desired image, page or catalog. Please take into account the following requirements:
    • The case of characters in URLs matters. The URL example.com/Sranitsa and example.com/stranitsa are not the same.
    • The path must refer to root directory your resource in Search Console.
    • Options with the prefixes http and https, as well as with and without the www subdomain, mean the same thing. Therefore, if we talk about example.com/stranitsa , then:
      • https://example.com/stranitsa is no different;
      • http://example.com/stranitsa is no different;
      • https://www.example.com/stranitsa is no different;
      • http://www.example.com/stranitsa is no different;
      • http://m.example.com/stranitsa is different. Subdomains m. & (as well as all others) make URLs inconsistent.
    • To hide an entire site, do not specify a path and in the next step select the option Clear cache and temporarily hide all URLs that start with....
  5. Click Continue.
  6. Select the desired action from those listed below.
  7. Click Send request. It may take up to a day to process. We do not guarantee that the request will be fulfilled. Check the status of your request. If it was rejected, click More details to view more information.
  8. Send additional requests, indicating all the URLs that could open the same page, as well as case-changed URL variations if your server supports them. In particular, the following URLs can point to the same page:
    • example.com/mypage
    • example.com/MyPage
    • example.com/page?1234
  9. If you want to permanently remove a URL from search results, please read the next section.

Permanently deleted

The URL removal tool allows you to remove pages only for a while. If you want to permanently remove content from Google search results, take additional steps:

  • Remove or change site content(images, pages, directories) and make sure that the server returns an error code of 404 (not found) or 410 (deleted). Files that are not in HTML format (such as PDF) must be completely removed from the server. Learn more about HTTP status codes...
  • Block access to content, for example set a password.
  • Prevent page crawling using the noindex meta tag. This method is less reliable than the others.

Cancel URL removal

If you want to restore a page to search results before the temporary block ends, open the status page in the tool and click Re-enable next to the completed URL removal request. It may take several days for your request to be processed.

Using the tool for purposes other than its intended purpose

The URL removal tool is intended for urgent blocking of content, for example in cases where sensitive data has been accidentally exposed. Using this tool for purposes other than its intended purpose will result in negative consequences for your website.

  • Don't use the tool to remove unnecessary items, such as old pages with a 404 error message. If you've changed the structure of your site and some of the URLs in Google's index are out of date, search robots will detect this and re-crawl them, and old pages will be gradually excluded from search results. There is no need to request an urgent update.
  • Do not use the tool to remove crawl errors from your Search Console account. This feature prevents addresses from appearing in Google search results rather than in your Google account. You don't need to manually remove the URL. Over time they will be excluded automatically.
  • Do not use the URL removal tool when completely redesigning your site from scratch. If your site is subject to manual action or was purchased from a previous owner, we recommend submitting a review request. Please let us know what changes you made and what problems you encountered.
  • Do not use a tool to "take down" a site after it has been hacked. If your site has been hacked and you want to remove pages with malicious code from the index, use a URL blocking tool to block new URLs created by the attacker, such as http://www.example.com/buy-cheap-cialis-skq3w598.html. However, we do not recommend blocking all pages on a site or those URLs that will need to be indexed in the future. Instead remove malicious code, to Google robots may have re-crawled your site.
  • Don't use a URL removal tool to index the correct "version" of your site. On many resources, the same content and files can be found at different URLs. If you don't want your content to be duplicated in search results, read. Don't use the URL removal tool to block unwanted versions of URLs. This will not help preserve the preferred version of the page, but will result in the removal of all versions of the URL (with the http or https prefixes, as well as with and without the www subdomain).

Was this article helpful?

How can this article be improved?

Site pages may disappear from Yandex search results for several reasons:

  • Error when loading or processing a page by a robot - if the server response contained an HTTP status of 3XX, 4XX or 5XX. The tool will help you identify the error Checking the server response .
  • Page indexing is prohibited in the robots.txt file or using a meta tag with the noindex directive.
  • The page redirects the robot to other pages.
  • The page duplicates the content of another page.
  • The page is not canonical.

The robot continues to visit pages excluded from the search, and a special algorithm checks the likelihood of them being shown in the search results before each update search base. Thus, the page may appear in search within two weeks after the robot learns about its change.

If you have resolved the reason for deleting the page, submit the page for re-crawling. This will inform the robot about the changes.

Questions and answers about pages excluded from search

The meta tags Description, Keywords and title element, the page meets all the requirements. Why isn't she in the search?

The algorithm checks the site pages not only for the presence of all the necessary tags, but also for the uniqueness, completeness of the material, its relevance and relevance, as well as many other factors. At the same time, you should pay attention to meta tags. For example, Description meta tag and the title element can be created automatically and repeat each other.

If on the site a large number of Almost identical products that differ only in color, size or configuration may also not be included in the search. To this list you can also add pagination pages, product selection or comparison pages, image pages that have no text content at all.

Pages that appear as excluded pages open normally in the browser. What does it mean?

This can happen for several reasons:

  • The headers that the robot requests from the server are different from the headers requested by the browser. Therefore, excluded pages can open correctly in the browser.
  • If a page is excluded from search due to an error loading it, it will disappear from the list of excluded pages only if it becomes available when the robot is accessed again. Check the server response at the URL you are interested in. If the response contains the HTTP status 200 OK, wait for the robot to visit again.

The "Excluded Pages" list shows pages that are no longer on the site. How to remove them?

In the Pages in search section, in the list Excluded Pages, pages that the robot accessed but did not index are displayed (these may be pages that no longer exist if they were previously known to the robot).

A page is removed from the excluded list if:

  • it is unavailable to the robot for some time;
  • it is not referenced by other site pages or external sources.

The presence and number of excluded pages in the service should not affect the site’s position in search results.







2024 gtavrl.ru.