The basics of client caching in clear words and examples. Last-modified, Etag, Expires, Cache-control: max-age and other headers. Caching Best Practices Simple ETag Caching

By including external CSS and Javascript, we want to reduce unnecessary HTTP requests to a minimum.

For this purpose, .js and .css files are served with headers that ensure reliable caching.

But what do you do when one of these files changes during development? All users have it in cache old version- until the cache is outdated, there will be a lot of complaints about broken integration of the server and client parts.

Proper caching and versioning eliminates this problem completely and provides reliable, transparent synchronization of style/script versions.

Simple ETag caching

The simplest way to cache static resources is to use ETag.

It is enough to enable the appropriate server setting (for Apache it is enabled by default) - and for each file in the headers an ETag will be given - a hash that depends on the update time, file size and (on inode-based file systems) inode.

The browser caches such a file and, on subsequent requests, specifies an If-None-Match header with the ETag of the cached document. Having received such a header, the server can respond with code 304 - and then the document will be taken from the cache.

It looks like this:

First request to the server (cache clean) GET /misc/pack.js HTTP/1.1 Host: website

In general, the browser usually adds a bunch of headers like User-Agent, Accept, etc. They are cut for brevity.

Server response The server responds with a document with code 200 and ETag: HTTP/1.x 200 OK Content-Encoding: gzip Content-Type: text/javascript; charset=utf-8 Etag: "3272221997" Accept-Ranges: bytes Content-Length: 23321 Date: Fri, 02 May 2008 17:22:46 GMT Server: lighttpd Next browser request At the next request, the browser adds If-None-Match: (cached ETag): GET /misc/pack.js HTTP/1.1 Host: site If-None-Match: "453700005" Server response The server looks - yeah, the document has not changed. This means you can issue a 304 code and not resend the document. HTTP/1.x 304 Not Modified Content-Encoding: gzip Etag: "453700005" Content-Type: text/javascript; charset=utf-8 Accept-Ranges: bytes Date: Tue, 15 Apr 2008 10:17:11 GMT

Alternative option- if the document has changed, then the server simply sends 200 with the new ETag.

The Last-Modified + If-Modified-Since combination works in a similar way:

  • the server sends the date of the last modification in the Last-Modified header (instead of ETag)
  • the browser caches the document, and the next time a request for the same document is made, it sends the date of the cached version in the If-Modified-Since header (instead of If-None-Match)
  • the server checks the dates, and if the document has not changed, it sends only the 304 code, without the content.
  • These methods work reliably and well, but the browser still has to make a request for each script or style.

    Smart caching. Versioning

    The general approach for versioning - in a nutshell:

  • The version (or modification date) is added to all scripts. For example, http://site/my.js will become http://site/my.v1.2.js
  • All scripts are hard cached by the browser
  • When updating the script, the version changes to a new one: http://site/my.v2.0.js
  • The address has changed, so the browser will request and cache the file again
  • The old version 1.2 will gradually fall out of the cache
  • Hard caching

    Hard caching- a kind of sledgehammer that completely nails requests to the server for cached documents.

    To do this, just add the Expires and Cache-Control: max-age headers.

    For example, to cache for 365 days in PHP:

    Header("Expires: ".gmdate("D, d M Y H:i:s", time()+86400*365)." GMT"); header("Cache-Control: max-age="+86400*365);

    Or you can cache the content permanently using mod_header in Apache:

    Having received such headers, the browser hard caches the document for a long time. All further access to the document will be served directly from the browser cache, without contacting the server.

    Most browsers (Opera, Internet Explorer 6+, Safari) DO NOT cache documents if there is a question mark in the address, because they are considered dynamic.

    That's why we add the version to the file name. Of course, with such addresses you have to use a solution like mod_rewrite, we will look at this later in the article.

    P.S But Firefox caches addresses with question marks...

    Automatic name resolution

    Let's look at how to automatically and transparently change versions without renaming the files themselves.

    Name with version -> File

    The simplest thing is to turn the name with the version into the original file name.

    At the Apache level this can be done with mod_rewrite:

    RewriteEngine on RewriteRule ^/(.*\.)v+\.(css|js|gif|png|jpg)$ /$1$2 [L]

    This rule processes all css/js/gif/png/jpg files, removing the version from the name.

    For example:

    /images/logo.v2.gif -> /images/logo.gif
    /css/style.v1.27.css -> /css/style.css
    /javascript/script.v6.js -> /javascript/script.js

    But in addition to cutting out the version, you also need to add hard caching headers to the files. The mod_header directives are used for this:

    Header add "Expires" "Mon, 28 Jul 2014 23:30:00 GMT" Header add "Cache-Control" "max-age=315360000"

    And all together it implements the following Apache config:

    RewriteEngine on # removes the version, and at the same time sets the variable that the file is versioned RewriteRule ^/(.*\.)v+\.(css|js|gif|png|jpg)$ /$1$2 # hard cache versioned files Header add "Expires" "Mon, 28 Jul 2014 23:30:00 GMT" env=VERSIONED_FILE Header add "Cache-Control" "max-age=315360000" env=VERSIONED_FILE

    Due to the way the mod_rewrite module works, RewriteRule needs to be placed in the main configuration file httpd.conf or included files, but never in .htaccess , otherwise the Header commands will be run first, before the VERSIONED_FILE variable is set.

    Header directives can be anywhere, even in .htaccess - it doesn't matter.

    Automatically adding version to filename on HTML page

    How to put the version in the name of the script depends on your template system and, in general, the way you add scripts (styles, etc.).

    For example, when using the modification date as a version and the Smarty template engine, links can be set like this:

    The version function adds the version:

    Function smarty_version($args)( $stat = stat($GLOBALS["config"]["site_root"].$args["src"]); $version = $stat["mtime"]; echo preg_replace("! \.(+?)$!", ".v$version.\$1", $args["src"]); )

    Result on the page:

    Optimization

    To avoid unnecessary stat calls, you can store an array with a list of current versions in a separate variable

    $versions["css"] = array("group.css" => "1.1", "other.css" => "3.0", )

    In this case, the current version from the array is simply substituted into the HTML.

    You can cross both approaches, and during development produce a version by modification date - for relevance, and in production - a version from an array, for performance.

    Applicability

    This caching method works everywhere, including Javascript, CSS, images, Flash movies, etc.

    It is useful whenever the document changes, but the browser should always have the current, up-to-date version.

    The cache plays an important role in the operation of almost any web application at the level of working with databases, web servers, and also on the client.

    In this article we will try to understand client caching. In particular, we'll look at what HTTP headers are used by browsers and web servers and what they mean.

    But first, let's find out why client-side caching is needed at all? .

    Web pages consist of many various elements: pictures, css and js files, etc. Some of these elements are used on several (many) pages of the site. Client-side caching refers to the ability of browsers to save copies of files (server responses) so as not to download them again. This allows you to significantly speed up page reloading, save on traffic, and also reduce the load on the server.

    There are several different HTTP headers to control client-side caching processes. Let's talk about each of them.

    Http headers to control client caching

    First, let's look at how the server and browser interact in the absence of any caching. For a clear understanding, I tried to imagine and visualize the process of communication between them in the form of a text chat. Imagine for a few minutes that the server and browser are people who correspond with each other :)

    Without cache (in the absence of caching http headers)

    As we can see, every time the cat.png image is displayed, the browser will download it from the server again. I think there is no need to explain that this is slow and ineffective.

    Last-modified response header and if-Modified-Since request header.

    The idea is that the server adds a Last-modified header to the file (response) it gives to the browser.

    The browser now knows that the file was created (or modified) on December 1, 2014. The next time the browser needs the same file, it will send a request with an if-Modified-Since header.

    If the file has not been modified, the server sends an empty response to the browser with a status of 304 (Not Modified). In this case, the browser knows that the file has not been updated and can display the copy it saved last time.

    Thus, using Last-modified we save on loading large file, getting off with an empty quick response from the server.

    Etag response header and If-None-Match request header.

    The principle of operation of Etag is very similar to Last-modified, but, unlike it, it is not tied to time. Time is a relative thing.

    The idea is that when it is created and every time it changes, the server tags the file with a special tag called ETag, and also adds a header to the file (response) that it sends to the browser:

    ETag: "686897696a7c876b7e"

    Now the browser knows that the current version file has an ETag equal to “686897696a7c876b7e”. The next time the browser needs the same file, it will send a request with the If-None-Match header: "686897696a7c876b7e" .

    If-None-Match: "686897696a7c876b7e"

    The server can compare the labels and, if the file has not been modified, send an empty response to the browser with a status of 304 (Not Modified). As with Last-modified, the browser will figure out that the file has not been updated and will be able to display a copy from the cache.

    Title Expired

    The principle of operation of this header differs from the Etag and Last-modified described above. Using Expired, the “expiration date” (“relevance period”) of the file is determined. Those. On first load, the server lets the browser know that it doesn't plan to change the file until the date specified in Expired:

    Next time, the browser, knowing that the “expiration date” has not yet arrived, will not even try to make a request to the server and will display the file from the cache.

    This type of cache is especially relevant for illustrations for articles, icons, favicons, some css and js files, etc.

    Cache-control header with max-age directive.

    The principle of operation of Cache-control: max-age is very similar to Expired. Here, the “expiration date” of the file is also determined, but it is set in seconds and is not tied to a specific time, which is much more convenient in most cases.

    For reference:

    • 1 day = 86400 seconds
    • 1 week = 604800 seconds
    • 1 month = 2629000 seconds
    • 1 year = 31536000 seconds

    Eg:

    Cache-Control: max-age=2629000;

    The Cache-control header has other directives besides max-age. Let's take a quick look at the most popular ones:

    public
    The fact is that requests can be cached not only by the user’s end client (browser), but also by various intermediate proxies, CDN networks, etc. So, the public directive allows absolutely any proxy server to perform caching just like a browser.

    private
    The directive states that this file(server response) is end-user specific and should not be cached by various intermediate proxies. At the same time, it allows caching to the end client (user's browser). For example, this is relevant for internal user profile pages, requests within a session, etc.

    Allows you to specify that the client should make a request to the server every time. Sometimes used with the Etag header described above.

    no-store
    Instructs the client that it should not retain a copy of the request or parts of the request under any circumstances. This is the strictest header, overriding any caches. It was invented specifically for working with confidential information.

    must-revalidate
    This directive instructs the browser to make a mandatory request to the server to re-validate the content (for example, if you are using eTag). The fact is that http in a certain configuration allows the cache to store content that is already out of date. must-revalidate obliges the browser to check the freshness of content under any circumstances by making a request to the server.

    proxy-revalidate
    This is the same as must-revalidate, but only applies to caching proxies.

    s-maxage
    Practically no different from max-age , except that this directive is taken into account only by the cache of different proxies, but not by the user’s browser itself. The letter “s -” comes from the word “shared” (eg CDN). This directive is intended specifically for CDNs and other intermediary caches. Specifying it overrides the values ​​of the max-age directive and the Expired header. However, if you are not building CDN networks, then you are unlikely to ever need s-maxage.

    How can I see what headers are used on a site?

    You can view the http request headers and response headers in the debugger of your favorite browser. Here's an example of what it looks like in Chrome:

    The same thing can be seen in any self-respecting browser or http sniffer.

    Setting up caching in Apache and Nginx

    We will not retell the documentation for setting up popular servers. You can always watch it and. Below we will give some real-life examples to show what configuration files look like.

    Example Apache configurations to control Expires

    We set different “expiration dates” for various types files. One year for images, one month for scripts, styles, pdfs and icons. For everything else – 2 days.

    ExpiresActive On ExpiresByType image/jpg "access plus 1 year" ExpiresByType image/jpeg "access plus 1 year" ExpiresByType image/gif "access plus 1 year" ExpiresByType image/png "access plus 1 year" ExpiresByType text/css "access plus 1 month" ExpiresByType application/pdf "access plus 1 month" ExpiresByType text/x-javascript "access plus 1 month" ExpiresByType image/x-icon "access plus 1 year" ExpiresDefault "access plus 2 days"

    Example Nginx configuration to control Expires

    We set different “expiration dates” for different types of files. One week for images, one day for styles and scripts.

    Server ( #... location ~* \.(gif|ico|jpe?g|png)(\?+)?$ ( expires 1w; ) location ~* \.(css|js)$ ( expires 1d; ) #... )

    Apache configuration example for Cache-control (max-age and public/private/no-cache) Header set Cache-Control "max-age=2592000, public" Header set Cache-Control "max-age=88000, private, must- revalidate" Header set Cache-Control "private, no-store, no-cache, must-revalidate, no-transform, max-age=0" Header set Pragma "no-cache" Nginx configuration example for Cache-control static files server ( #... location ~* \.(?:ico|css|js|gif|jpe?g|png)$ ( add_header Cache-Control "max-age=88000, public"; ) #... ) In conclusion

    “Cache everything that can be cached” is a good motto for a web developer. Sometimes you can spend just a few hours on configuration and at the same time significantly improve the user experience of your site, significantly reduce server load and save on traffic. The main thing is not to overdo it and set everything up correctly, taking into account the characteristics of your resource.

    Properly configured caching provides huge performance benefits, saves bandwidth and reduces server costs, but many sites implement caching poorly, creating a race condition that causes interconnected resources to become out of sync.

    Overwhelming majority best practices caching refers to one of two patterns:

    Pattern No. 1: immutable content and long max-age cache Cache-Control: max-age=31536000
    • The content of the URL does not change, therefore...
    • The browser or CDN can easily cache the resource for a year
    • Cached content that is younger than the specified max-age can be used without consulting the server

    Page: Hey, I need "/script-v1.js" , "/styles-v1.css" and "/cats-v1.jpg" 10:24

    Cash: I'm empty, how about you, Server? 10:24

    Server: OK, here they are. By the way, Cash, they should be used for a year, no more. 10:25

    Cash: Thank you! 10:25

    Page: Hurray! 10:25

    The next day

    Page: Hey, I need "/script-v2 .js" , "/styles-v2 .css" and "/cats-v1.jpg" 08:14

    Cash: There is a picture with cats, but not the rest. Server? 08:14

    Server: Easy - here's the new CSS & JS. Once again, Cash: their shelf life is no more than a year. 08:15

    Cash: Great! 08:15

    Page: Thank you! 08:15

    Cash: Hmm, I haven't used "/script-v1.js" & "/styles-v1.css" in quite some time. It's time to remove them. 12:32

    Using this pattern, you never change the content of a specific URL, you change the URL itself:

    Every URL has something that changes along with the content. This could be a version number, a modified date, or a content hash (which is what I chose for my blog).

    Most server-side frameworks have tools that allow you to do things like this with ease (in Django I use Manifest​Static​Files​Storage); There are also very small libraries in Node.js that solve the same problems, for example, gulp-rev.

    However, this pattern is not suitable for things like articles and blog posts. Their URLs cannot be versioned and their content may change. Seriously, I often have grammatical and punctuation errors and need to be able to quickly update the content.

    Pattern #2: mutable content that is always validated on the server Cache-Control: no-cache
    • The content of the URL will change, which means...
    • Any locally cached version cannot be used without specifying the server.

    Page: Hey, I need the contents of "/about/" and "/sw.js" 11:32

    Cash: I can't help you. Server? 11:32

    Server: There are some. Cash, keep them with you, but ask me before using them. 11:33

    Cash: Exactly! 11:33

    Page: Thank you! 11:33

    The next day

    Page: Hey, I need the contents of "/about/" and "/sw.js" again 09:46

    Cash: Just a minute. Server, are my copies okay? The copy of "/about/" is from Monday, and "/sw.js" is from yesterday. 09:46

    Server: "/sw.js" has not changed... 09:47

    Cash: Cool. Page, keep "/sw.js" . 09:47

    Server: …but I have “/about/” new version. Cash, hold it, but like last time, don't forget to ask me first. 09:47

    Cash: Got it! 09:47

    Page: Great! 09:47

    Note: no-cache does not mean “do not cache”, it means “check” (or revalidate) the cached resource on the server. And no-store tells the browser not to cache at all. Also, must-revalidate does not mean mandatory revalidation, but that the cached resource is used only if it is younger than the specified max-age , and only otherwise it is revalidated. This is how it all started with keywords for caching.

    In this pattern, we can add an ETag (version ID of your choice) or a Last-Modified header to the response. The next time the client requests content, an If-None-Match or If-Modified-Since is output, respectively, allowing the server to say “Use what you have, your cache is up to date,” i.e. return HTTP 304.

    If sending ETag / Last-Modified is not possible, the server always sends the entire content.

    This pattern always requires network calls, so it's not as good as the first pattern, which can do without network requests.

    It is not uncommon that we do not have the infrastructure for the first pattern, but problems with network requests in pattern 2 may also arise. As a result, an intermediate option is used: short max-age and mutable content. This is a bad compromise.

    Using max-age with mutable content is generally the wrong choice

    And, unfortunately, it is common; Github pages is an example.

    Imagine:

    • /article/
    • /styles.css
    • /script.js

    With server header:

    Cache-Control: must-revalidate, max-age=600

    • URL content changes
    • If the browser has a cached version more recent than 10 minutes, it is used without consulting the server
    • If there is no such cache, a network request is used, if possible with If-Modified-Since or If-None-Match

    Page: Hey, I need "/article/", "/script.js" and "/styles.css" 10:21

    Cash: I have nothing, like you, Server? 10:21

    Server: No problem, here they are. But remember, Cash: they can be used within the next 10 minutes. 10:22

    Cash: Yes! 10:22

    Page: Thank you! 10:22

    Page: Hey, I need "/article/" , "/script.js" and "/styles.css" again 10:28

    Cash: Oops, I'm sorry, but I lost "/styles.css", but I have everything else, here you go. Server, can you customize "/styles.css" for me? 10:28

    Server: Easy, he's already changed since the last time you took him. You can safely use it for the next 10 minutes. 10:29

    Cash: No problem. 10:29

    Page: Thank you! But it seems something went wrong! Everything is broken! What is going on? 10:29

    This pattern has the right to life during testing, but it breaks everything in a real project and is very difficult to track. In the example above, the server has updated the HTML, CSS and JS, but the page is rendered with the old cached HTML and JS, plus the updated CSS from the server. Version mismatch ruins everything.

    Often when we make significant changes to HTML, we change both the CSS to properly reflect the new structure and the JavaScript to keep up with the content and styling. All of these resources are independent, but caching headers cannot express this. As a result, users may find themselves latest version one/two resources and the old version of the rest.

    max-age is set relative to the response time, so if all resources are transferred as part of a single address, they will expire at the same time, but there is still a small chance of desynchronization. If you have pages that do not include JavaScript or include other styles, their cache expiration dates will be out of sync. And worse, the browser is constantly pulling content from the cache, not knowing that HTML, CSS, & JS are interdependent, so it can happily pull one thing from the list and forget about everything else. Considering all these factors together, you should understand that the likelihood of mismatched versions is quite high.

    For the user, the result may be a broken page layout or other problems. From small glitches to completely unusable content.

    Fortunately, users have an emergency exit...

    Refreshing the page sometimes helps

    If the page is loaded by refresh, browsers always perform server-side revalidation, ignoring max-age . Therefore, if the user has something broken due to max-age , a simple page refresh can fix everything. But, of course, after the spoons are found, sediment will still remain and the attitude towards your site will be somewhat different.

    A service worker can extend the life of these bugs

    For example, you have a service worker like this:

    Const version = "2"; self.addEventListener("install", event => ( event.waitUntil(caches.open(`static-$(version)`) .then(cache => cache.addAll([ "/styles.css", "/script .js" ]))); )); self.addEventListener("activate", event => ( // ...delete old caches... )); self.addEventListener("fetch", event => ( event.respondWith(caches.match(event.request) .then(response => response || fetch(event.request))); ));

    This service worker:

    • caches script and styles
    • uses cache if there is a match, otherwise accesses the network

    If we change the CSS/JS, we also increase the version number, which triggers an update. However, since addAll accesses the cache first, we can get into a race condition due to max-age and mismatched CSS & JS versions.

    Once they are cached, we will have incompatible CSS & JS until the next service worker update - and that is unless we get into a race condition again during the update.

    You can skip caching in the service worker:

    Self.addEventListener("install", event => ( event.waitUntil(caches.open(`static-$(version)`) .then(cache => cache.addAll([ new Request("/styles.css", ( cache: "no-cache" )), new Request("/script.js", ( cache: "no-cache" )) ]))); ));

    Unfortunately, options for caching are not supported in Chrome/Opera and have just been added to the nightly build of Firefox, but you can do it yourself:

    Self.addEventListener("install", event => ( event.waitUntil(caches.open(`static-$(version)`) .then(cache => Promise.all([ "/styles.css", "/script .js" ].map(url => ( // cache-bust using a random query string return fetch(`$(url)?$(Math.random())`).then(response => ( // fail on 404, 500 etc if (!response.ok) throw Error("Not ok"); return cache.put(url, response); )) ))))); ));

    In this example, I'm resetting the cache using a random number, but you can go further and add a hash of the content when building (this is similar to what sw-precache does). This is a kind of implementation of the first pattern using JavaScript, but only works with the service worker, not browsers and CDN.

    Service workers and HTTP cache work great together, don't make them fight!

    As you can see, you can work around the caching bugs in your service worker, but it's better to solve the root of the problem. Correct setting caching not only makes the service worker's job easier, but also helps browsers that don't support service workers (Safari, IE/Edge), and also allows you to get the most out of your CDN.

    Proper caching headers can also make updating a service worker much easier.

    Const version = "23"; self.addEventListener("install", event => ( event.waitUntil(caches.open(`static-$(version)`) .then(cache => cache.addAll([ "/", "/script-f93bca2c. js", "/styles-a837cb1e.css", "/cats-0e9a2ef4.jpg" ]))); ));

    Here I cached the root page with pattern #2 (server-side revalidation) and all other resources with pattern #1 (immutable content). Each service worker update will cause a request to the root page, and all other resources will only be loaded if their URL has changed. This is good because it saves traffic and improves performance, regardless of whether you are upgrading from a previous one or very old version.

    There is a significant advantage here over the native implementation, when the entire binary is downloaded even with a small change or causes a complex comparison binary files. This way we can update a large web application with a relatively small load.

    Service workers work better as an enhancement rather than a temporary crutch, so work with the cache instead of fighting it.

    When used carefully, max-age and variable content can be very good

    max-age is very often the wrong choice for mutable content, but not always. For example, the original article has a max-age of three minutes. The race condition is not a problem since there are no dependencies on the page using the same caching pattern (CSS, JS & images use pattern #1 - immutable content), everything else does not use this pattern.

    This pattern means that I can comfortably write a popular article, and my CDN (Cloudflare) can take the load off the server, as long as I'm willing to wait three minutes for the updated article to become available. accessible to users.

    This pattern should be used without fanaticism. If I added a new section to an article, and linked to it from another article, I created a dependency that must be resolved. The user can click on the link and get a copy of the article without the desired section. If I want to avoid this, I should refresh the article, delete the cached version of the article from Cloudflare, wait three minutes, and only then add the link to another article. Yes, this pattern requires caution.

    When used correctly, caching provides significant performance improvements and bandwidth savings. Serve immutable content if you can easily change the URL, or use server-side revalidation. Mix max-age and mutable content if you're brave enough and confident that your content doesn't have dependencies that could get out of sync.

    When making changes to websites, we often encounter the fact that the contents of pages, css files and scripts (.js) are cached by the browser and remain unchanged for quite a long time. This leads to the fact that in order for the changes made to be reflected in all browsers, it is necessary to accustom clients to complex combinations of F5 or Ctrl + F5. And from time to time make sure that they are pressed.

    The process is quite tedious and inconvenient. You can, of course, get out of the situation by renaming the files each time, but again it’s inconvenient.

    However, there is a way that will allow us to remain with the same names and reset the caching of .css or .js files at the moment when we need it. And forget about Ctrl + F5 forever.

    The bottom line is that we will attach a pseudo-parameter to our .css or .js files at the end, which we will change from time to time, thereby resetting the cache in the browser.

    Thus, the entry in source code will now look like this:

    Where 186485 is an arbitrary combination that will output the same file, but the browser interprets it as new, thanks to the pseudo-parameter ?186485

    Now, in order not to change all occurrences of our parameter each time, we will set it in a php file, which we will connect to all the places we need: