Description of the man sitemap xml. Detailed Sitemap Guide


The sitemap.xml file is a tool that allows webmasters to inform search engines about the site pages available for indexing. Also, in the XML map you can specify additional page parameters: date last update, update frequency and priority relative to other pages. Information in sitemap.xml can influence the behavior of the search crawler and, in general, the process of indexing new documents. The sitemap contains directives for including pages in the queue for crawling and complements robots.txt, which contains directives for excluding pages.

In this guide you will find answers to all questions regarding the use of sitemap.xml.

Do I need sitemap.xml

Search engines use sitemap to find new documents on the site (this can be html documents or media content) that are not accessible through navigation, but need to be crawled. Having a link to a document in sitemap.xml does not guarantee that it will be crawled or indexed, but most often the file will help large sites be indexed better. In addition, data from XML maps are used when defining canonical pages, unless specifically indicated in the rel=canonical tag.

Sitemap.xml is important for sites where:

  • Some sections are not accessible through the navigation menu.
  • There are many isolated pages or poorly connected pages.
  • Technologies that are poorly supported by search engines are used (for example, Ajax, Flash or Silverlight).
  • There are a lot of pages and there is a chance that the search crawler will miss new content.

If this is not your case, then most likely you do not need sitemap.xml. For sites where every page important for indexing is available within 2 clicks, where they are not used to display content JavaScript technologies or Flash, where canonical and regional tags are used if necessary, and fresh content appears no more often than a robot visits the site, there is no need for a sitemap.xml file.

For small projects, if there is only a problem with a large level of document nesting, it can be easily solved with using HTML sitemaps without resorting to using XML cards. But if you decide that you still need sitemap.xml, then read this guide in its entirety.

Technical information

  • Sitemap.xml is a text file in XML format. However, search engines also support text format(see next section).
  • Each sitemap can contain a maximum 50,000 addresses and weigh no more 50MB(10MB for Yandex).
  • You can use gzip compression to reduce the size of the sitemap.xml file and increase its transfer speed. In this case, use the gz extension (sitemap.xml.gz). At the same time, weight restrictions remain for uncompressed sitemaps.
  • The location of the Sitemap determines the set of URLs that can be included in the Sitemap. The map containing the addresses of the pages of the entire site should be located in the root. If the sitemap is located in a folder, then all URLs in this sitemap should be located in this folder or deeper ().
  • Addresses in sitemap.xml must be absolute.
  • The maximum URL length is 2048 characters (1024 characters for Yandex).
  • Special characters in the URL (such as ampersand "&" or quotes) must be masked in the HTML entity.
  • The pages specified in the map must display a 200 http status code.
  • The addresses listed in the map should not be closed in the robots.txt file or in meta-robots.
  • The sitemap should not be closed in robots.txt, otherwise the search engine will not crawl it. The file itself may be in the index, this is normal.

XML map formats

Search engines support a simple text sitemap format, which simply lists the URLs of pages without additional parameters. In this case, the file must be UTF-8 encoded and have the extension .txt.

Search engines also support the standard XML protocol. Google additionally supports sitemaps for images, videos, and news.

An example sitemap containing only one address.

https://сайт/ 2018-06-14 daily 0.9

XML tags
urlset
url(required) - The parent tag for each URL.
loc(required) - Document URL, must be absolute.
lastmod- date last change document in Datetime format.
changefreq- frequency of page changes (always, hourly, daily, weekly, monthly, yearly, never). The meaning of this tag is a recommendation to search engines, not a command.
priority- URL priority relative to other addresses (from 0 to 1) for scanning order. If not specified, the default is 0.5.

XML map for images

Some optimizers insert links to images into sitemap.xml in the same way as links to HTML documents. This can be done, but it is better for Google to use an extension standard protocol and send along with the URLs Additional information about images. Creating XML image maps is useful if images need to be scanned and indexed, and at the same time, they are not directly accessible to the bot (for example, JavaScript is used).

An example of a sitemap containing one page and its associated images

http://example.com/primer.html http://example.com/kartinka.jpg http://example.com/photo.jpg Вид на Балаклаву Севастополь, Крым http://creativecommons.org/licenses/by-nd/3.0/legalcode

XML tags
image:image(required) - information about one image. A maximum of 1000 images can be used.
image:loc(required) - path to the image file. If a CDN is used, then it is acceptable to link to another domain if it is verified in the webmaster panel.
image:caption- caption for the image (may contain long text).
image:title- title image (usually short text).
image:geo_location- the shoot place.
image:license- Image license URL. Used for advanced image search.

XML map for video

Similar to the image map, Google also has a video sitemap extension where you can specify detailed information about video content, which affects display in video search. A video sitemap is necessary when the site uses videos that are hosted locally, and when indexing these videos is difficult due to the technologies used. If you are embedding a video from YouTube on your website, then a video-sitemap is not needed here.

News Sitemap

If you have news content on your site and participate in Google News, it is useful to use a Sitemap for news, so Google will quickly find your latest materials and index all news articles. In this case, the Sitemap should contain only addresses of pages published in the last 2 days and contain no more than 1000 URLs.

Using multiple cards

If necessary, you can use several sitemaps, combining them into one index sitemap. Multiple sitemap.xml are used in cases where:

  • The site uses several engines (CMS).
  • The site has more than 50,000 pages.
  • It is necessary to set up convenient error tracking in sections.

In the latter case, each large section of the site has its own sitemap.xml and all of them are added to the panel for webmasters, where it is convenient to see which section has the most errors (see the section on finding errors in the sitemap).

If you have 2 or more sitemaps, they need to be combined into an index sitemap, which looks the same as a regular sitemap (except for the presence of sitemapindex and sitemap tags instead of urlset and url), has similar restrictions and can only link to regular XML maps (not index maps) .

Example Sitemap Index:

http://www.example.com/sitemap-blog.xml.gz 2004-10-01T18:23:17+00:00 http://www.example.com/sitemap-webinars.xml.gz 2005-01-01

sitemapindex(mandatory) - specifies the current protocol standard.
sitemap(mandatory) - contains information about a separate sitemap.
loc(required) - sitemap location (in xml, txt or rss format for Google).
lastmod- time of sitemap change. Allows search engines to quickly discover new URLs on large sites.

How to create sitemap.xml

Methods XML creation Sitemap:

  • Internal CMS tools. Many CMSs already support sitemap creation. To find out, read the documentation for your CMS, look at the menu items in the admin panel, or contact engine technical support. Upload the file https://yoursite.com/sitemap.xml on your site; it may already exist and is being dynamically generated.
  • External plugins. If the CMS does not have functionality for generating a sitemap, and it supports plugins, Google which plugin covers the sitemap.xml question for your engine and install it. In some cases, you need to contact programmers to write a similar plugin for you.
  • Separate script on the site. Knowing the XML map protocol and technical limitations, you can create sitemap.xml yourself by adding a generation script to CRON. If you are not a programmer, use the other items in this list.
  • Sitemap generators. There are many sitemap.xml generators that scan your site and give you a ready-made map to download. The disadvantage here is that every time the site is updated, you need to manually generate a sitemap.
  • Parsers. Desktop programs designed for technical analysis site, usually provide the opportunity to download sitemap.xml, generated based on crawled pages. It works similarly to sitemap generators, only it runs locally on your machine.

Popular online sitemap generators

XML-Sitemaps.com

Allows you to get sitemap.xml in a few clicks. Supports XML, HTML, TXT and GZ formats. Convenient to use for small sites (up to 500 pages).

A similar generator, but has a little more settings and allows you to create a map of up to 2000 pages for free.

Has many settings, allows you to import URLs from a CSV file. Scans up to 500 URLs for free.

There is no limit on the number of pages to scan. But for large sites, the generation process may freeze for several tens of minutes.

Local programs for generating XML Sitemap

G-Mapper Sitemap Generator

Free desktop version of the sitemap generator for Windows.

Screaming Frog SEO Spider

Flexible sitemap generation tool with many settings. Convenient if you already use screamin frog for other SEO tasks. After scanning the site, use the menu item Sitemaps -> Create XML Sitemap.

Netpeak Spider

Less flexible, but still convenient solution for quick sitemap.xml generation. After scanning the site, you need to use the menu item Tools -> Generate Sitemap.

If the main purpose of robots.txt is to prohibit indexing, then the sitemap sitemap.xml performs exactly the opposite tasks. It is responsible for accelerating site indexing and complete site indexing.

Sitemap.xml tells the search engine the frequency with which it needs to re-index pages. In this regard, a site map is especially important for sites with regularly updated content (news portals, etc.). Additionally, the sitemap.xml contains everything important pages site indicating their priority.

Requirements for a site map

A sitemap is an XML file that lists a website's URLs, combined with the metadata associated with each URL (the date it was last modified; how often it is changed; how it is prioritized at the site level) so that search engines can more intelligently scan this site.

The total number of sitemap.xml on the site should not exceed 1000, while the number of records (urls) in each should not exceed 50,000.

If you need to list more than 50,000 URLs, you should create multiple Sitemaps.

The sitemap can be compressed with a gzip archiver to reduce its size. But the size of each sitemap in expanded (unzipped) form should not exceed 10 megabytes.

The sitemap does not have to be in the form of an xml file. The protocol allows map generation in the form of syndication (RSS or Atom) or in the form of a simple text file with line-by-line listing of URLs. But such “site maps” either do not include all site URLs (in the case of syndication) or do not carry additional important information(date and time of page content modification), which is precisely why sitemaps are used in SEO.

By providing a timestamp of the last modification You allow search engine crawlers to retrieve only a portion of the Sitemap files in the index, meaning the crawler can only retrieve those sitemap files (pages) that have been modified since specific date. This mechanism for partial file extraction from Sitemap.xml allows you to quickly discover new URLs on large sites. In addition, this allows you to reduce the load on both the server and the search engine crawler. And they (search engines) really love the latter.

Combining a sitemap with robots.txt and robots meta tags

Sitemap.xml instructions, when used correctly, should complement each other. Exist three most important rules interactions of these instructions:

  • sitemap.xml, robots.txt and robots meta tags should not contradict each other;
  • all pages excluded (blocked) in robots.txt and robots meta tags must also be excluded from sitemap.xml;
  • all indexable pages allowed in robots.txt must be contained in sitemap.xml

Exceptions to the three rules

There are exceptions to these three rules. And, as always, they are linked to pagination pages. Starting from the second and further pagination pages, we write noindex, follow in the robots meta tags, while in robots.txt the pagination pages are not closed from indexing.

Set the date and time of page change equal to the date and time of change of the main (first, main page) page of the catalog. In principle, we can agree with this.

Old-school optimizers advise adding only unindexed or changed pages to the sitemap.xls file. Pages that are included in the index should be removed from the sitemap. But it’s more difficult to agree with this opinion. If there is a lastmod field and its correct filling, there is no need for such delights.

Main problems when using sitemap.xml

When executing, I most often encountered the following errors:

  1. Inconsistency of sitemap.xml with site pages, outdated sitemap. This problem occurs when the sitemap is generated not dynamically, but episodically, by launching some service in the CMS, or even third party services. This creates a mass dead pages giving a 404 error (this is if the page was physically deleted or moved to another location, changed URL address). In addition, new pages are indexed much slower because they are not in the sitemap.xml.
  2. The next error is incorrect sitemap.xml structure. This error occurs, as a rule, on “home-written” CMS or when using incorrect plugins for popular CMS. In this case, a sitemap.xml file is generated in violation of the structure described by the protocol.
  3. Modification of this error is incorrect operation with the record modification date. From a protocol point of view, this is not an error, since the lastmod field is optional. From the point SEO perspective and search engines, lack correct value in this field (coinciding with a direct change in content) - completely neutralizes the significance of the entire sitemap.xml file. As mentioned above, PS will reindex those pages whose lastmod field has changed. What happens if this field changes simultaneously for all records (pages) of the site. That is, the modification date is the same for all site files. Most likely, PS will not pay attention to the sitemap and the site will be re-indexed in the usual way, while deeply located pages either are not re-indexed at all, or will take a very long time to be re-indexed. So, it is necessary either not to use the lastmod field at all (which is bad), or to set the date of the last significant change pages, for example, when the price has changed or the product is out of stock or the description has changed.
  4. The next group of errors encountered are logical ones, caused by a violation of three rules about the combination of robots.txt and sitemap.xml. In this case, you can observe a page constantly entering the index and immediately leaving it. However, this will not be observed if there is a noindex meta tag and an entry in the Sitemap.xml. In this case, the crawler (robot, PS spider) that visits the page will not index it.
  5. Well last mistake, often found on websites, is the presence of “orphaned pages.” These are pages that have a link from the sitemap, but there is not a single direct link from any of the site pages. This is often due to the fact that pages were deleted “logically” (for example, in WordPress, placed in the trash) rather than physically. This is also observed on sites where access to product cards is done using scripts and filters in a way that does not allow the results of these scripts to be indexed. There may be other reasons for the appearance of such orphan pages. All this reduces the trust of search engines in the site and is a negative ranking signal.

According to the protocol, after changing the sitemap, you can re-ping search engines. To do this, you need to create a query of the following type.

You are just a cretin if you didn’t give the sitemap the right attention at the time. It is enough to understand the issue once and avoid it in the future. large quantity mistakes, so let's do it now.

Your humble servant was also such a cretin in his younger years when he just started promoting websites in one office. At that time I came across one website for promotion, who, it should be said, was just crap. And this shit had problems with indexing. Naturally, if the site were of sufficient quality, both search engines would index it no matter what the problems, but the owners relied on a normal designer, layout designer and programmer, and in this case the SEO specialist can only, so to speak, open the bottle with scissors. I tried everything on it - the last-modified setting, speeding up indexing using the fastbot that was fashionable at that time, and buying links. And only then it turned out that the problem was that the sitemap was not updated automatically! When I updated it, all the pages flew into the index.

What is a sitemap and why is it needed?

What is a sitemap? This is a file with information about the site pages that need to be indexed. Typically, a sitemap is created for Yandex and Google to notify search robots about pages that need to be included in the index. Using a sitemap, you can also check how often updates occur and which web documents are most important to index. In general, they spoke very well about it at the Yandex Webmaster:

[yt=INGCBkR26eo] [yt=INGCBkR26eo]

Does having a sitemap affect promotion?

If you do not have a sitemap, this does not mean that search engines will not index the resource. Search robots often scan sites quite well without this and include them in the search. But sometimes glitches can occur, due to which sometimes it is not possible to find all web documents. The main reasons are:

  1. Sections of the site that can only be reached by making a long chain of transitions;
  2. Dynamic URLs.

So, creating sitemap.xml helps solve this problem in many ways. This file affects SEO only insofar as it facilitates/speeds up the indexing of pages. It also increases the chance that web pages will be indexed before your competitors can copy the content and publish it on their site.

What other format does a sitemap come in and why is it made in XML format?

Why you need a site map, we figured it out. Now let's look at what formats it can be made in:

  1. In html format. It is created in the form of an ordinary page with addresses leading to the main sections of the resource. This type of map helps you quickly find your way, and is designed more for people than for search robots. You can place a sitemap in HTML limited number links (no more than 100), because if there are more of them, not all of them will be included in the index. Or search robots may completely exclude such a page from the search for an excessive number of URLs, even internal ones.
  2. Creating an xml sitemap file. There are no too critical restrictions on the number of links, and search engines index it better, because the sitemap xml file contains full information in a form understandable to the robot. It is especially important for projects where there are hundreds and thousands of documents of equal importance, and the placement of all links to them is necessary. This type of sitemap has the ability to place up to 50 thousand URLs and in addition, you can set the frequency of updates and approximate priority (priority), which cannot be said about a map in HTML format. It is for these reasons that a sitemap is almost always created in xml.

Here's more information about this file:

[yt=ti3NKPknHDA] [yt=ti3NKPknHDA]

How to make the right sitemap

Let's look at how to make a proper xml map. The following requirements must be met:

  1. The file size should be no more than 10 MB;
  2. The map should contain no more than 50,000 links. In cases where there are more links, you can create several maps and include them in the main xml map;
  3. The sitemap address should be entered in robots.txt;
  4. Also upload the sitemap to Yandex and Google (how to add a file is described below);
  5. Search engines must have access to the map. It is necessary to use special tags that let search engines understand that this is a map and not something else;
  6. The sitemap must have UTF-8 encoding.

Let me give you a simple example of a map:

http://site.ru/ 2016-11-20T19:45:08+03:00 always 0,9 http://site.ru/category/ 2016-11-20T19:46:38+03:00 monthly 0,6 http://site.ru/page/ 2016-11-20T19:48:41+03:00 yearly 0.4

< url >

< loc >http : //site.ru/

< lastmod >2016 - 11 - 20T19: 45: 08 + 03: 00< / lastmod >

< changefreq >always< / changefreq >

< priority > 0 , 9 < / priority >

< / url >

< url >

< loc >http : //site.ru/category/

< lastmod >2016 - 11 - 20T19: 46: 38 + 03: 00< / lastmod >

< changefreq >monthly< / changefreq >

< priority > 0 , 6 < / priority >

< / url >

< url >

< loc >http : //site.ru/page/

< lastmod >2016 - 11 - 20T19: 48: 41 + 03: 00< / lastmod >

< changefreq >yearly< / changefreq >

< priority > 0.4 < / priority >

< / url >

The url and loc tags are required. The first contains all the information about a specific URL. The second contains the address itself.

The lastmod, changefreq, priority tags are not mandatory, but it is still recommended to use them.

Lastmod in the sitemap is responsible for the date of the last update.

Changefreq indicates the frequency of page changes. The values ​​can be as follows:

  1. Hourly – updates hourly;
  2. Always – always updated;
  3. Weekly – updated once a week;
  4. Daily – updates occur daily;
  5. Monthly – updates occur once a month;
  6. Yearly – once a year;
  7. Never – not updated (it is better not to use this value).

Priority tells search engines how important a page is compared to others. The priority can be set from 0.1 (low) to 1 (high).

This was just an example map, you do not need to specify these exact values. In general, it is recommended to set priority as follows: maximum for home page(1), for headings the average (0.6), and for entries - minimal (0.4).

Now let's look at an example where there are more than 50 thousand links. In this case, the file includes other maps:

http://site.ru/sitemaps/sitemap01.xml 2016-11-20T21:37:28+03:00 http://site.ru/sitemaps/sitemap02.xml 2016-11-20T21:37:29+03:00

< sitemap >

< loc >http: //site.ru/sitemaps/sitemap01.xml

< lastmod >2016 - 11 - 20T21: 37: 28 + 03: 00< / lastmod >

< / sitemap >

< sitemap >

< loc >http: //site.ru/sitemaps/sitemap02.xml

< lastmod >2016 - 11 - 20T21: 37: 29 + 03: 00< / lastmod >

< / sitemap >

How to create a sitemap

There are several ways to create an xml map, let's look at them:

  1. Download the map using an online generator from another resource;
  2. Generate using a special program. But it is worth considering that programs of this kind are mostly paid. An example of such a generator: Wonder WebWare SiteMap Generator. Screaming Frog also has this feature;
  3. Create a sitemap manually;
  4. Automatically create a map using a CMS (for example, such a function is available on WordPress).

Here is an option on how to make a sitemap without the help of plugins:

[yt=Tnfy601BUZc] [yt=Tnfy601BUZc]

Plugins for creating sitemaps on WordPress

You can create a sitemap in WordPress using a special plugin called Google XML Sitemaps. Everything is simple here: download the plugin, install it, then start creating the file. To do this, open Console-Settings and select XML-sitemap. Next we set the settings. We leave the priority as default.

The robots.txt and sitemap.xml files make it possible to organize site indexing. These two files complement each other well, although at the same time they solve opposite problems. If robots.txt serves to prohibit indexing of entire sections or individual pages, then sitemap.xml, on the contrary, tells search robots which URLs need to be indexed. Let's analyze each of the files separately.

Robots.txt file

robots.txt is a file in which rules are written that restrict search robots’ access to directories and site files in order to avoid their contents being included in the search engine index. The file must be located in the root directory of the site and be available at: site.ru/robots.txt.

In robots.txt, you need to block all duplicate and service pages of the site from indexing. Often public CMS create duplicates, articles can be accessed in several URLs at the same time, for example; in categories site.ru/category/post-1/, tags site.ru/tag/post-1/ and archive site.ru/arhive/post-1/. In order to avoid duplicates, it is necessary to prohibit indexing of tags and the archive; only categories will remain in the index. Under service pages, I mean the pages of the administrative part of the site and automatically generated pages, for example: results when searching on the site.

It is simply necessary to get rid of duplicates, as they deprive the site’s pages of uniqueness. After all, if the index contains several pages with the same content, but accessible at different URLs, then the content of none of them will be considered unique. As a result, search engines will forcefully lower the positions of such pages in the search results.

Robots.txt directives

Directives are rules, or you can also say commands for search robots. The most important one is User-agent, with its help you can set rules for all robots or for a specific bot. This directive is written first, and after it all other rules are indicated.

# For all robots User-agent: * # For Yandex robot User-agent: Yandex

Another mandatory directive is Disallow, with its help sections and pages of the site are closed, and its opposite is the Allow directive, which, on the contrary, forcibly allows indexing specified sections and website pages.

# Prohibit indexing of the section Disallow: /folder/ # Allow indexing of the subsection with pictures Allow: /folder/images/

In order to indicate the main mirror of the site, for example: with or without www, the Host directive is used. It is worth noting that the main mirror is registered without specifying the http:// protocol, but the https:// protocol must be specified. Host is understood only by Yandex and Mail.ru bots and you only need to enter the directive once.

# If the main mirror works on http protocol without www Host: site.ru # If the main mirror works according to https protocol from www Host: https://www.site.ru

Sitemap is a directive indicating the path to the sitemap.xml file, the path must be specified in full with the protocol, this directive can be written anywhere in the file.

# Specify the full path to the sitemap.xml file Sitemap: http://site.ru/sitemap.xml

To simplify writing rules, there are special symbolic operators:

  • * - denotes any number of characters, as well as their absence;
  • $ - means that the symbol before the dollar sign is the last one;
  • # - denotes a comment, everything that is on the line after of this operator will be ignored by search robots.

After familiarizing yourself with the basic directives and special operators You can already sketch out the contents of a simple robots.txt file.

User-agent: * Disallow: /admin/ Disallow: /arhive/ Disallow: /tag/ Disallow: /modules/ Disallow: /search/ Disallow: *?s= Disallow: /login.php User-agent: Yandex Disallow: / admin/ Disallow: /arhive/ Disallow: /tag/ Disallow: /modules/ Disallow: /search/ Disallow: *?s= Disallow: /login.php # Allow the Yandex robot to index images in the modules section Allow: /modules/*. png Allow: /modules/*.jpg Host: site.ru Sitemap: http://site.ru/sitemap.xml

Acquainted with detailed description All directives with examples of their use can be found in the publication on the Yandex website in the help section.

Sitemap.xml file

sitemap.xml is a so-called site map for search engines. The sitemap.xml file contains information for search robots about the site pages that need to be indexed. The contents of the file must contain the URL addresses of the pages, but it is not necessary to indicate the priority of the pages, the frequency of page re-crawling, and the date and time the pages were last modified.

It should be noted that sitemap.xml is not required, and search engines may not take it into account, but at the same time, all search engines say that having the file is desirable and helps to correctly index the site, especially if pages are created dynamically or the site has a complex structure nesting.

There is only one conclusion: that the robots.txt and sitemap.xml files are necessary. Correct setting indexing is one of the factors in placing site pages in higher places in search results, and this is the goal of any more or less serious site.

Sitemap (Sitemap.xml) is special file in .xml format, stored in the root directory of the server. This is a file with information about the site pages that need to be indexed. Typically, a sitemap is created for Yandex and Google to notify search robots about pages that need to be included in the index. Using a sitemap, you can also check how often updates occur and which web documents are most important to index.

Video from Yandex Webmaster:

[yt=INGCBkR26eo]

Sitemap.xml is compiled taking into account a special syntax that is understandable search engines, where all pages to be indexed will be listed, indicating their level of importance, the date of last update and the approximate frequency of updating.

There are two main files that any web project must have - robots.txt and sitemap.xml. If your project does not have them or they are not filled out correctly, then with a high degree of probability you are seriously harming your resource and not allowing it to reveal itself to its full potential.

Does the sitemap.xml file affect website promotion?

If you do not have a sitemap, this does not mean that search engines will not index the resource. Search robots often scan sites quite well without this and include them in the search. But sometimes glitches can occur, due to which sometimes it is not possible to find all web documents. The main reasons are:

  1. Sections of the site that can only be reached by making a long chain of transitions;
  2. Dynamic URLs.

Creating a sitemap.xml helps solve this problem.

Sitemap.xml affects SEO indirectly: by facilitating and speeding up the indexing of pages.

Sitemap sitemap in HTML format

Sitemaps are divided into 2 main types or formats: sitemap html and a sitemap xml file. HTML sitemap is a site page, which lists the links. Usually these are links to the most important sections and pages of the site. HTML sitemap is more designed for people rather than robots and helps you quickly navigate the main sections of the site. For a sitemap in the form of an HTML page, there are serious restrictions on the number of links on one page. If there are too many links on a page, not all links may be indexed, or the sitemap page may even be excluded from searches for having an excessive number of links, even internal links.

In order for the html sitemap to be correctly indexed and adequately perceived by visitors, you should not place more than 100 links on the page. This is more than enough to place on the page all the sections and subsections that do not fit into the main menu.

Usually, sitemap file in HTML format has a tree structure, where expanded sections and subsections are indicated. Unnecessarily cumbersome HTML site maps are often designed graphic elements, CSS styles and complemented by Java script. However, the html sitemap does not have of great importance for search engines.

An HTML sitemap is not a full-fledged sitemap. What to do if the site has hundreds, thousands, tens of thousands of pages? To do this, you need to place links to all pages in the sitemap in xml format.

Sitemap sitemap.txt

Another way to create a site map in the form of a file can be a site map in txt format:

1. http://site.ru/ 2. http://site.ru/page/ 3. http://site.ru/page1/

It's simple. The sitemap.txt file lists all the necessary links line by line. A sitemap in txt format is an “option for the lazy”. A similar sitemap xml limitation of 50,000 links works here. However, the TXT sitemap does not have the ability to indicate the last modified date and page priority.

XML Sitemap

XML sitemap is a file xml format, like sitemap.xml, which is usually located at the root of the site. A sitemap in xml format has many advantages over an html sitemap:

  • Sitemap xml is special format site maps, which is determined by all popular search engines, such as Google and Yandex.
  • You can specify up to 50,000 links in xml sitemap.
  • In sitemap xml you can specify the relative priority and frequency of page updates.

The contents of the site map are only recommendations for search robot. For example, if you set an annual update frequency for a website page, search robots will still visit more often. And if you set the page refresh rate to be hourly, this does not mean that robots will index the page every hour.

How to create the correct sitemap.xml

Let's look at how to make a proper xml map. The following requirements must be met:

  1. The file size should be no more than 10 MB;
  2. The map should contain no more than 50,000 links. In cases where there are more links, you can create several maps and include them in the main xml map;
  3. The sitemap address should be entered in robots.txt;
  4. Also upload the sitemap to Yandex and Google (how to add a file is described below);
  5. Search engines must have access to the map. It is necessary to use special tags that let search engines understand that this is a map and not something else;
  6. The sitemap must have UTF-8 encoding.

The contents of the sitemap.xml file look like:

http://site.ru/ 2015-10-18T18:54:13+04:00 always 1.0 http://site.ru/category/ 2015-10-18T18:57:09+04:00 hourly 0.8 http://site.ru/page/ 2015-10-18T18:59:37+04:00 daily 0.6

Where the following required tags are used:

  • - parent tag, it contains all URLs;
  • - a tag that contains information about a specific URL;
  • https://gtavrl.ru/en/- V this tag the url is indicated directly.
  • - this tag contains the date the page was last modified;
  • - the tag is used to indicate how often the page changes: always, hourly, daily, weekly, monthly, yearly, never;
  • Indicates the priority of a particular page relative to other pages on the site from 0.1 – low priority, to 1 – high priority.

Changefreq indicates the frequency of page changes:

  1. Hourly – updates hourly;
  2. Always – always updated;
  3. Weekly – updated once a week;
  4. Daily – updates occur daily;
  5. Monthly – updates occur once a month;
  6. Yearly – once a year;
  7. Never – not updated (it is better not to use this value).

Priority tells search engines how important a page is compared to others. The priority can be set from 0.1 (low) to 1 (high).

The sitemap.xml file must contain a reference to the XML language namespace:

Xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

If the sitemap file includes more than 50 thousand links or the size of the sitemap.xml exceeds 10 MB, it is recommended to split the sitemap into several files. In this case, in the site map you need to indicate several links to different files kart.

http://site.ru/sitemaps/sitemap01.xml!} 2015-10-18T18:54:13+04:00 http://site.ru/sitemaps/sitemap02.xml 2015-10-18T18:54:13+04:00

Tags that are already familiar to us are used here And , as well as required tags:

  • - parent tag, which contains the addresses of all site maps;
  • - a tag that contains parameters for each sitemap.

How to create Sitemap.xml

Creating a site map important process, in which it is necessary to clearly indicate which pages of the site need to be indexed and how best to index them. Depending on what type of sitemap we're talking about, various ways to create a site map. There is no point in discussing how to create an html sitemap separately. Let's look at how to make a map in xml file format. There are several basic ways to create a sitemap, but what they all have in common is where the sitemap is located and how the sitemap is determined by search engines.

As already written above - The sitemap file is located at the root of the site. Search engines are able to independently detect a sitemap file. But there are several ways to provide a direct link to the sitemap file(s) for faster discovery by search engines. The easiest way to specify the location of the sitemap file is to directly indicate a link or several links to sitemap files in the webmaster tools from Yandex and Google. There you can check sitemap, conduct analysis of the site map for correctness, correspondence of which pages from the site map are found by the search engine and how many of them are indexed.

The second way to point search engines to the location of a sitemap file is with the Sitemap directive in the robots.txt file.

Sitemap: http://site.ru/sitemap.xml

You can specify several sitemap files in robots.txt, after which it will automatically be added to webmaster tools. We've looked at how to find a sitemap, now let's move on to how to create a sitemap.

Basic ways to create a sitemap

  1. Generating a site map by the site management system, if the CMS has such a built-in capability.
  2. Download site map from outside online service. There are many online sitemap generators with different capabilities and limitations. Probably one of the most famous online generators sitemaps are Sitemap Generator. It has quite a lot of functionality and will allow you to generate a sitemap for 1500 pages for free, which is quite a lot. There is also xml-sitemaps.com, which has the ability to customize sitemap parameters, but has a limit on the number of links in the sitemap.xml file of 500 pieces.
  3. Download sitemap generator. Similar programs generators are usually paid, but with their help you can regularly generate sitemap xml for one or several sites. Here are a couple of examples of such generator programs: SiteMap XML Dynamic SiteMap Generator, WonderWebWare SiteMap Generator.
  4. Automatic creation of sitemap sitemap in Joomla (Jumla), WordPress (Wordpress), Bitrix (Bitrix), ModX.
  5. Creating a sitemap manually.

Sitemap WordPress

You can create a sitemap for WordPress using the Google XML Sitemaps plugin. For it, you can make many settings that will allow you to exclude some of the materials on your site, and you can also set the expected update frequency. In addition to creating a map, Google plugin XML Sitemaps, when publishing new materials on your blog, notifies many search engines, inviting them to quickly index them.

You can set the path to the sitemap file yourself in the plugin settings and you can even give it a name different from the classic sitemap.xml.

[yt=5ZmRSR1bbEI]

Joomla sitemap

You can create a sitemap for Joomla using the Xmap component.

Check Sitemap for broken links

In order not to deceive the search robot, sitemap.xml must be configured without errors. Therefore, after each file update, you need to check the sitemap for broken links.

Go to Yandex Webmaster - section “Tools” - “Analysis of Sitemap files”.

Select one of the file upload methods:

  • copy text sitmepa.xml;
  • submit sitemap URL;
  • upload xml file to the service.

Checking sitemap.xml in Yandex Webmaster

Go to Google Search Console - section "Crawling" - "Sitemaps".

Analysis of Sitemap.xml from PixelPlus

Tool from pixelplus.ru - XML ​​sitemap analysis. It's simple, cool and understandable.

  1. Specify the sitemap (URL) or upload an XML file.

    We choose whether to check the server response code for each URL in it.

The tool will allow you to check the correctness of the site map (*.xml file) and also find:

    File validity errors.

    Those URLs that return a response code other than 200 OK.

    Other errors (pointing to a URL from another domain, excessive file size or number of URLs in it, and so on).

Let us remind you that the number of valid URLs in one file is 50,000, and the file size should not exceed 10 MB.

If errors are found (this happens often), the service will tell you which URLs give an incorrect response (deleted, unnecessary, and so on).

Sitemap.xml is an important tool

A site map is one of the important tools for SEO website promotion. It doesn't matter how the sitemap is created. It is important which links will be listed in the sitemap and how often it will be updated. Sometimes, everything is uploaded to the sitemap, even those links that are prohibited in robots.txt or non-canonical links. And the sitemap is updated once a month or less. Such an attitude towards the site map can not only make it useless, but even worse - confuse the search robot, which will negatively affect the indexing and position of the site in the search.

Create a sitemap for your resource. But be careful and think carefully about what to upload to the sitemap and what not.







2024 gtavrl.ru.