How to find out the last modified page. Well, we figured out the CMS, but how does Yandex itself work?


Why configure the Last-Modified header. Let's try to figure out how to configure the Last-Modified header as quickly and simply as possible.

To begin with, I will say that this is necessary primarily to reduce the load on the server and speed up the indexing of pages. This is why it is necessary to configure the Last-Modified header, especially for large resources with a large number of pages.

The purpose of this header is to tell the client (browser or search engine) information about recent changes to a particular page. The client sends the If-Modified-Since header to the server. If no changes are detected on the page, then the “304 Not Modified” header is returned from the server. However, the page does not load.

If changes were made, this will be taken into account, and the server will return the “200 OK” header (the page will load with updated content).

Correctly setting Last-Modified provides the following benefits:

  • the date is displayed in the search results last update page content;
  • When sorted in a search engine by date, pages occupy higher positions;
  • Page indexing is significantly accelerated.
  • Why do robots index sites with Last-Modified configured faster?

    The answer is simple: if only 20 pages on the site have been changed, then the robot does not need to index all 500 in search of new content, since Last-Modified will indicate the pages with changes.

    Last-Modified is especially important for resources with a large number of pages, because the robot is given time to crawl each site. limited quantity time, and he may not have time to reach the necessary pages.

    How to set up Last-Modified

    First you need to check if you have this header configured. To do this, you can use the services varvy.com, last-modified.com or tools.seo-auditor.com.ru. If the check shows the absence of a header, then you should start setting it up.

    If you have a static site, then for each page you need to write the code that you see on the screen:

    After each content change, we manually change the date in the code. And what happens every time, you ask me? Yes, if the site is static.

    If the site is dynamic, then we use the setting in PHP. The following code is often encountered:

    header("Last-Modified: " . date('r',strtotime($post->post_modified)))

    It must be added to header.php. But it will only work for posts and pages and will not work on the main page. It is also not valid for taxonomies, archives, or new comment counting.

    $LastModified_unix = 1294844676;

    $Last Modified = gmdate("D, d M Y H:i:s \G\M\T", $LastModified_unix);

    $IfModifiedSince = false;

    if (isset($_ENV[‘HTTP_IF_MODIFIED_SINCE’]))

    $IfModifiedSince = strtotime(substr($_ENV[‘HTTP_IF_MODIFIED_SINCE’], 5));

    if (isset($_SERVER[‘HTTP_IF_MODIFIED_SINCE’]))

    $IfModifiedSince = strtotime(substr($_SERVER[‘HTTP_IF_MODIFIED_SINCE’], 5));

    If ($IfModifiedSince && $IfModifiedSince >= &LastModified_unix) (

    header($_SERVER[‘SERVER_PROTOCOL’] . ‘304 Not Modified’);

    header('Last-Modified: ' . $LastModified);

    You can also customize the header by writing two lines in the .htaccess file:

    RewriteRule .* —

    RewriteRule .* —

    But in this case, you need to check whether there will be any problems on the hosting side.

    In order not to write codes, you can use ready-made solutions for setting up Last-Modified. For example, for CMS WordPress there are Clearfy and Last Modified Timestamp plugins. Customization can be done using the WP plugin Super Cache. To do this, in the advanced settings, activate the “Error 304” item (support for 304 responses is disabled by default, as problems may arise on some hosting sites). Other CMSs also have their own plugins, or, as a last resort, you can order a plugin from a programmer.

    It should be understood that the Last-Modified setting is not always useful (for example, when each page has a “end-to-end” block of information with regularly changing content of the same content). In this case, search engines may stop perceiving the information as new and will visit your site less often.

    In other cases, by configuring Last-Modified, you get:

    • traffic savings;
    • speed up the website;
    • follow the recommendations of Google and Yandex search engines, which significantly speeds up indexing and increases the visibility of pages in search. This is especially noticeable on resources with a large number of pages.

    Last-Modified and If-Modified-Since Headers for WordPress

    Few people pay attention to HTTP headers Last-Modified And If-Modified-Since when optimizing your site, but in vain! It is important that the page whose content has not changed since the last visit search robot gave a 304 code, which actually means that this particular page was not supplemented with anything - you did not edit or supplement the text, no comments were added to this post, etc.

    If this http header is missing, then in Yandex, when sorting results by date, the site will not be visible to most users.

    That is why it is important that you not only set it up correctly, but also update the date to the current one every time you edit a record. This will need to be done manually.

    With comments it’s simpler: when a visitor adds a comment, then in the variable $last_modified_time the time the comment was added is entered automatically - this will be the date last change pages.

    Why do we need the Last-Modified and If-Modified-Since headers?

    1. When the server sends such code, the execution of all PHP scripts on the page does not even start. The page is loaded from the search cache, and this, as you understand, very significantly reduces the load on the server, much to the delight of your hoster, and speeds up page loading for the visitor, which is also good news.

    How does this happen?

    When crawling the Internet, Google and Yandex spiders save a copy of each site in their database. This copy serves as a kind of sample for comparison: is everything still the same or have changes occurred. And if the Last-Modified and If-Modified-Since headers are not configured or are configured incorrectly, new pages on the site are indexed, and the main page in the search engine cache is not updated for a long time, just as the comment feed is not updated.

    But for frequently updated pages (news feeds updated many times a day, actively commented blogs, etc.) it has one drawback: the information in the cache becomes outdated too quickly and a person, even reloading the page, does not see the latest news, does not sees new comments. But that's not so bad. The trouble is that the robot doesn't see this either, unless the correct Last-Modified header is included.

    header("Last-Modified: ".gmdate("D, d M Y H:i:s ")."GMT");

    If your site is updated frequently (for example, your posts are often commented on), you can disable caching with the following set of headers:

    header("Expires: ".gmdate("D, d M Y H:i:s", time() + 7200)." GMT");

    This means that the validity of the stored copy must be double-checked with each request.

    How does caching work in browsers?

    If it is not disabled by calling the no_cache function, then in Firefox and IE the page is stored in the cache, and for all subsequent requests it is this page that is returned.

    To refresh the page and get the latest version, you need to press the key combination Ctrl+F5, the usual “Update” button (F5) does not work. And I must say, documents in the IE cache can be stored for a very, very long time.

    In Opera, the cache page is cleared by pressing the “Refresh” button or the F5 key. The combination CRTL+F5 in Opera - reloads all open tabs. As you understand, if you open them a lot, you may grow a beard while waiting.

    If you disable page caching with the no_cache function, then Opera and Firefox, when accessing such a page, use the mechanism with the If-Modified-Since header. Thus, caching occurs, but the browser asks the server whether the page has actually changed or not - this is the correct way to pose the question.

    Therefore, you need to enable processing of this parameter as well. I won’t describe what this function means, I’ll just give code that sends headers correctly and doesn’t cause conflicts on most hosting sites I’ve worked with. This design works on sweb.ru, eomy.net, timeweb.ru, fastvps.ru, startlogic.com

    header("Expires: ".gmdate("D, d M Y H:i:s", time() + 7200)." GMT");
    header("Cache-Control: no-cache, must-revalidate");
    $mt = filemtime($file_name);
    $mt_str = gmdate("D, d M Y H:i:s ")."GMT";
    if (isset($_SERVER["HTTP_IF_MODIFIED_SINCE"]) &&
    strtotime($_SERVER["HTTP_IF_MODIFIED_SINCE"]) >= $mt)
    (header("HTTP/1.1 304 Not Modified");
    die;
    }
    header("Last-Modified: ".$mt_str);
    echo $text;
    header("Vary: Accept-Encoding");
    header("Accept-Encoding:gzip,deflate,sdch");
    ?>

    So all you have to do is copy this code and add it to the file header.php Your theme ABOVE . Those. this code is at the very top of the file BEFORE all the rest of the code


    Attention! Before adding anything, save this file on your computer so that you can restore the original version if yours does not allow such a header configuration.

    We check the result using the Last-Modified and If-Modified-Since header checking service http://last-modified.com/ru/if-modified-since.html


    • If the result is positive, we wipe the sweat from our forehead and go drink tea.
    • If the result is negative, the same construction can be added to the file index.php in the root of your WordPress (I encountered this on the hosting timeweb.ru). Likewise, above everything else in it. Just don’t forget about this when you update - the index file will be overwritten in its standard form.

    Voila! By correctly setting the Last-Modified and If-Modified-Since headers, we got a bunch of bonuses:

    • Increased page loading speed, which is important for Google robot and nice for people.
    • We reduced the load on the server, which pleased the hoster.
    • IN search results Yandex will display the date of the last page update, which in some cases is very important for people, and therefore indirectly this will have a positive effect on behavioral factors.
    • The pages of our site will participate in sorting search engines by date - yes, yes, this is used by advanced users.
    • And, as a consequence of all of the above, the indexing of our site by search engines will greatly accelerate.

    In area search engine optimization There are a lot of different myths circulating around websites (SEO). Some of them have a basis, some of them came from nowhere. In this note we will look at one of them - using the last-Modified response header.

    Some time ago we received a document entitled “Ingate Recommendations for Web Studios on Promoted Sites.” And one of the “recommendations” was the following:

    After a redesign or on a new site being developed, the date of the last modification of the site pages (Last Modified) must be indicated.

    To add information about the date of the last modification of pages to a site in PHP, you need to go to the very beginning source code insert a script into each page

    header("Last-Modified: " . date("D, d M Y H:i:s", time()) . " GMT");
    ?>

    It was this wild nonsense, this utter nonsense and frankly crazy code that prompted me to write this note. Here I will try to explain what Last-Modified is, why it is needed and how browsers and search engines use it.

    What is Last-Modified

    When transmitting information to the client (browsers or search robot), the web server reports quite a lot of additional data. They can be viewed in the browser console, for example:

    configure the server to issue correct response headers (for example, if the page does not exist, issue a 404 error, and if an If-Modified-Since request is received, then issue a 304 code if the page has not been changed since the date specified in the request).

    You can also see that if the server does not respond to the conditional GET request, then it is no different from a regular request. That is, the Last-Modified header with current time, besides, incorrectly formed ones (hello Integgate!) are not needed at all!

    So is Last-Modified necessary or not?

    Generally necessary. But it is important to understand that it is not the header itself that plays any role, but the entire conditional request scenario, which must be fully implemented by the site. It is in this case that we will get high speed site indexing.

    But it is often very difficult to implement this in a ready-made CMS. This may require quite significant changes to the code of the CMS itself.

    Although for a number of CMS this can be achieved by enabling page caching. If the CMS caches pages, creating and serving essentially static files, then the web server itself will respond correctly to conditional requests. For example, in WordPress this can be achieved using the WP Super Cache plugin:

    Let's check it in action. I enabled this plugin, opened the browser in anonymous mode and made two requests for the same page. It is clearly seen that the second answer is correct - 304 Not Modified:

    Instead of a conclusion

    Thus, we have dealt with the Last-Modified header. First, it must convey information about the date and time the document was actually modified. Secondly, the server’s response to a conditional request with the If-Modified-Since header is extremely important.

    Well, listen less to SEOs who don’t know the basics of how the Internet works.

    “In particular, the content of the response that the server gives to the “if-modified-since” request is important. Last-Modified header should give the correct date of the last modification of the document.”

    Let's check how things work with Last-Modified in various CMSs.

    # telnet www.example.com 80

    and enter the following:

    GET /index.html HTTP/1.0 User-Agent: Mozilla/5.0 From: something.somewhere.net Accept: text/html,text/plain,application/* Host: www.example.com If-Modified-Since: Wed, 19 Oct 2005 10:50:00 GMT

    if the server returns 304 (Not modified), then it supports If-Modified-Since, but the page has not been modified. Code 200 (Ok) means that the page has been changed.

    If-Modified-Since check in C#

    You can check how If-Modified-Since works using the following C# code:

    Private HttpWebResponse GetPage() ( string url = @"http://....."; // Place the web request to the server by specifying the URL HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url); // No need for a persistent connection request.KeepAlive = false; // The link that referred us to the URL request.Referer = url; // The user agent of the browser request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)"; //Instead of HTTP 1.1 I will use HTTP 1.0. When a request tells the server it uses 1.0, //the server won"t respond with chunked data but will send the response all at once. request.ProtocolVersion = new Version(1, 1); request.IfModifiedSince = DateTime.Now. AddDays(-5); // Get the response from the server return (HttpWebResponse)request.GetResponse(); ) private void TestLastModified(VirtueMartContext db, jos_vm_product product) ( using (HttpWebResponse response = GetPage()) ( Debug.Print( "Status Code: (0), Description: (1)\n", response.StatusCode, response.StatusDescription); string text = WebResponceReader.GetResponceText(response); Debug.Print(text.Substring(0, 100)); ) )

    Using this method you can ensure that Joomla always returns StatusCode=200 (OK), regardless of the value of request.IfModifiedSince.

    Checking If-Modified-Since via Yandex service

    If in Yandex Webmaster we click on the “Check server response” button, then we get here:

    here again you can see that the site is a site and, accordingly, WordPress without the WP Super Cache plugin does not add the Last-Modified header.

    Well, we’ve sorted out the CMS, but how does Yandex itself work?

    Here we can give the following example: today is July 7, 2011, the content in Joomla was updated on June 20, 2011, and Yandex has a version dated June 11, 2011 in its cache, although after this date the robot has arrived more than once. IN in this case Yandex downloads updates with a very significant delay. The question is why?

    Here is what Platon Shchukin says about this:

    As the robot crawls the site, it will also crawl the specified page, after which it will update search databases it will be updated in the output. We are working to make this happen as quickly as possible.

    For your part, you can also help the robot index the site faster by using the following recommendations:

    Last modified, as search engines claim, is a very important http header, which is needed to indicate the date of the last modification of the document, that is, the date of the last change on the page.

    Accordingly, if this header does not exist, or rather it will not be given, then the site will be deprived of some advantages. In particular, here is what I read on the Internet about the benefits last modified:

    1. The speed of indexing new pages improves, and in 1 visit the robot can pick up more pages to index.
    2. The speed of re-indexing of pages to which you have made changes improves. This is very useful, but without this header it will take longer for your edits to be recorded.

    In principle, this is already enough to want to check and, if necessary, customize this header.

    How to check last modified?

    There are several tools, I liked this one the most - http://www.tools.seo-auditor.com.ru/if-modified-since/
    Here I just need to enter the address home page or any article, and then select the search robot - Yandex.

    Last Modified was found on my website, it’s great. But initially it wasn’t there, how did I set it up?

    How to configure last modified?

    To be honest, it wasn't that easy for me to set it up. Maybe because nginx server. I installed AddHeaders - this plugin installs all useful http headers, including last modified, but it did not help my site, although about a year ago it successfully activated this header on my site.

    I also installed the premium Clearfy plugin on this blog. A useful thing, there is also a function that would allow you to put last modified. I activated the option, but the header was not returned based on the scan results. But in the end, everything was decided by contacting those. plugin support - there I described the configuration of my server and they gave me specific advice - go to the server control panel, disable this and that (SSI, in my opinion). No sooner said than done and now the title is given away.





    

    2024 gtavrl.ru.