Graphical HTTP cache

Graphical HTTP cache

This article was first published on the blog of Zhengcaiyun front-end team: Graphical HTTP cache https://www.zoo.team/article/http-cache

Preface

HTTP caching mechanism can be said to be one of the important knowledge points that front-end engineers need to master. This article will give a detailed explanation of the overall process of HTTP caching, and strive to make everyone have an overall understanding of caching after reading the entire article.

There are two types of HTTP caching, one is strong caching, and the other is negotiation caching. The main function is to speed up resource acquisition, improve user experience, reduce network transmission, and ease the pressure on the server. This is an overall flow chart of the cache operation:

Http cache.jpg

Strong cache

There is no need to send a request to the server, and directly read the browser’s local cache. The HTTP status code displayed in Chrome’s Network is 200. In Chrome, the strong cache is divided into Disk Cache (stored in the hard disk) and Memory Cache ( Stored in memory), the storage location is controlled by the browser. Whether to strongly cache is controlled by the three Header attributes of Expires, Cache-Control and Pragma.

○ Expires

The value of Expires is an HTTP date. When the browser initiates a request, it will compare the system time with the value of Expires. If the system time exceeds the value of Expires, the cache will be invalidated. Due to the comparison with the system time, when the system time and the server time are inconsistent, there will be a problem of inaccurate cache validity. The priority of Expires is the lowest among the three Header attributes.

○ Cache-Control

Cache-Control is a new attribute in HTTP/1.1, which can be used in both request headers and response headers. Commonly used attribute values ​​include:

  • max-age: The unit is seconds. The cache time is calculated as the number of seconds from the initiation time, and the number of seconds beyond the interval is cache invalidation
  • no-cache: Do not use strong cache, you need to verify with the server whether the cache is fresh
  • no-store: Prohibit the use of cache (including negotiation cache), request the latest resources from the server every time
  • private: dedicated to personal caching, intermediary agents, CDNs, etc. cannot cache this response
  • public: The response can be cached by intermediary agents, CDNs, etc.
  • must-revalidate: It can be used before the cache expires, and it must be verified with the server after it expires

○ Pragma

Pragma has only one attribute value, which is no-cache. The effect is the same as the no-cache in Cache-Control. It does not use strong cache and needs to verify with the server whether the cache is fresh. The priority of the three header attributes is the highest.

Start a service locally through express to verify the three attributes of strong caching, the code is as follows:

const express = require('express');
const app = express();
var options = { 
  etag: false,//disable negotiation caching
  lastModified: false,//disable negotiation caching
  setHeaders: (res, path, stat) => {
    res.set('Cache-Control','max-age=10');//Strong cache timeout time is 10 seconds
  },
};
app.use(express.static((__dirname +'/public'), options));
app.listen(3000);

When loaded for the first time, the page will request data from the server and add Cache-Control to the Response Header, with an expiration time of 10 seconds.

Cache 1.jpg

In the second load, the Date header attribute is not updated, and you can see that the browser directly uses the strong cache, and no request is actually sent.

Cache 2.jpg

After the 10-second timeout period has elapsed, request resources again:

Cache 3.jpg

When Pragma and Cache-Control exist at the same time, Pragma has a higher priority than Cache-Control.

Cache 5.jpg

Negotiation cache

When the browser’s strong cache is invalid or a non-strong cache is set in the request header, and If-Modified-Since or If-None-Match is set in the request header, these two attributes will be valued to the server To verify whether the negotiation cache is hit, if the negotiation cache is hit, the 304 status will be returned, the browser cache will be loaded, and the Last-Modified or ETag attribute will be set in the response header.

○ ETag/If-None-Match

The value of ETag/If-None-Match is a string of hash codes, which represents a resource identifier. When the file on the server changes, its hash code will change accordingly. Through the If-None- in the request header Match is compared with the hash value of the current file. If it is equal, it means that it hits the negotiation cache. ETag is also divided into strong and weak verification. If the hash code is a string of strings starting with "W/", it means that the verification of the negotiated cache is weak, and only the file difference on the server (calculated based on the ETag) The way to decide) When it can trigger the change of the hash value suffix, the resource will be actually requested, otherwise it will return 304 and load the browser cache.

○ Last-Modified/If-Modified-Since

The value of Last-Modified/If-Modified-Since represents the last modification time of the file. The server will put the last modification time of the resource in the Last-Modified response header for the first request. When the request is initiated the second time, The request header will carry the Last-Modified time in the last response header and put it in the If-Modified-Since request header attribute. The server compares the file’s last modification time with the value of If-Modified-Since, if Equal, return 304, and load the browser cache.

Start a service locally through express to verify the negotiation cache, the code is as follows:

const express = require('express');
const app = express();
var options = { 
  etag: true,//enable negotiation cache
  lastModified: true,//enable negotiation cache
  setHeaders: (res, path, stat) => {
    res.set({
      'Cache-Control':'max-age=00',//The browser does not strengthen the cache
      'Pragma':'no-cache',//The browser does not strengthen the cache
    });
  },
};
app.use(express.static((__dirname +'/public'), options));
app.listen(3001);

Request resources for the first time:

Cache 6.jpg

For the second resource request, the server verifies whether the file is modified according to the If-Modified-Since and If-None-Match in the request header.

Cache 7.jpg

Let's verify again that in the case of strong verification, ETag only adds a line of space and how the hash value changes. In the code, I use MD5 encryption on the file to calculate its hash value.

Note: Just for demonstration purposes, the actual calculation is not encrypted by MD5. By default, Apache automatically generates ETag through the configuration of FileEtag INode Mtime Size in FileEtag. Users can modify the way the file generates ETag in a custom way.

In order to ensure that lastModified does not affect the cache, I deleted the request header that passed Last-Modified/If-Modified-Since, the source code is as follows:

const express = require('express');
const CryptoJS = require('crypto-js/crypto-js');
const fs = require('fs');
const app = express();
var options = { 
  etag: true,//judge only by Etag
  lastModified: false,//Close another negotiation cache
  setHeaders: (res, path, stat) => {
    const data = fs.readFileSync(path,'utf-8');//read file
    const hash = CryptoJS.MD5((JSON.stringify(data)));//MD5 encryption
    res.set({
      'Cache-Control':'max-age=00',//The browser does not strengthen the cache
      'Pragma':'no-cache',//The browser does not strengthen the cache
      'ETag': hash,//Manually set the Etag value to the hash value after MD5 encryption
    });
  },
};
app.use(express.static((__dirname +'/public'), options));
app.listen(4000);//Use the new port number, otherwise the negotiation cache verified above will always exist

The first and second requests are as follows:

Cache 10.jpg

Cache 11.jpg

Then I modified test.js, added a space and then deleted a space to keep the content of the file unchanged, but the modification time of the file changed, and I initiated the third request, because I generated the ETag by MD5 encryption of the file content Generated, so although the modification time has changed, the request still returns 304 to read the browser cache.

Cache 13.jpg

The emergence of ETag/If-None-Match mainly solves the problems that Last-Modified/If-Modified-Since cannot solve:

  • If the frequency of file modification is below the second level, Last-Modified/If-Modified-Since will incorrectly return 304
  • If the file is modified, but there is no change in the content, Last-Modified/If-Modified-Since will incorrectly return 304. The above example illustrates this problem

summary

In actual usage scenarios, such as the official website of Zheng Caiyun. Images and static resources such as JS that don't change often use caching to improve page loading speed. For example, the navigation bar at the top of the Zheng Caiyun homepage, the buried point SDK, and so on.

At the end of the article, we return to this flowchart again. This picture covers the overall process of HTTP caching. After you are familiar with the overall process, you can also verify HTTP caching through Node by yourself.

Http cache.jpg

Reference: https://cloud.tencent.com/developer/article/1630751 Graphical HTTP Caching-Cloud + Community-Tencent Cloud