MDN’s new design is in Beta! A sneak peek: https://blog.mozilla.org/opendesign/mdns-new-design-beta/

HTTP 条件请求

翻译正在进行中。

HTTP has a concept of conditional requests, where the result, and even the success of a request, can be changed by comparing the affected resources with the value of a validator. Such requests can be useful to validate the content of a cache, and sparing a useless control, to verify the integrity of a document, like when resuming a download, or when preventing to lose updates when uploading or modifying a document on the server.

基本原理

在 HTTP 协议中,条件请求指的是请求的执行结果会因特定首部的值不同而不同。这些首部规定了请求的前置条件,请求结果则视条件匹配与否而有所不同。

请求引发的不同的反应取决于请求所使用的方法,以及组成前置条件首部集合:

  • 对于安全(safe)方法来说,例如  GET 对于安全方法来说,例如 GET 方法,通常用来获取文件,条件请求可以被用来限定仅在满足条件的情况下返回文件。这样可以节省带宽。
  • 对于非安全(unsafe)方法来说,例如  PUT 方法,通常用来上传文件,条件请求可以被用来限定仅在满足文件的初始版本与服务器上的版本相同的条件下才会将其上传。

验证器

所有的条件请求首部都是试图去检测服务器上存储的资源是否与某一特定版本相匹配。为了达到这个目的,条件请求需要指明资源的版本。由于逐个字节去比较完整资源是不切实际的,况且这也并非总是想要的结果,所以在请求中会传递一个描述资源版本的值。这些值称为“验证器”,并且分为两大类:

  • 文件的最后修改时间,即 last-modified (最后修改)时间。
  • 一个意义模糊的字符串,唯一指代一个版本,称为“实体标签”,或者 etag 。

比较同一份资源的不同版本有一定的技巧性:取决于上下文环境的不同,有两种不同的等值检查(equality checks)类型:

  • 强验证类型(Strong validation)应用于需要逐比特相对应的情况,例如需要恢复文件下载的时候。
  • 弱验证类型(Weak validation)应用于用户代理只需要确认资源内容相同即可。即便是有细微差别也可以接受,比如显示的广告不同,或者是页脚的时间不同。

验证类型与验证器的类型是相互独立的。 Last-ModifiedETag 首部均可应用于两种验证类型,尽管在服务器端实现的复杂程度可能会有所不同。HTTP 协议默认使用强验证类型,可以指定何时使用弱验证类型。

强验证类型

强验证类型的作用在于确保要比较的资源与其相比较的对象之间每一个字节都相同。对于有些首部来说需要明确指定该验证类型,而对于另外一些来说则是默认值就是强验证类型。强验证类型的要求相当严格,在服务器层面来说可能较难保证。但是它确保了数据在任何时候都没有缺损,有时候则需要以牺牲性能为代价。

使用 Last-Modified 首部很难为强验证类型提供一个唯一标识。通常这是由 ETag 首部来完成的,该首部可以提供使用 MD5 算法获取的资源(或其衍生品)的散列值。

弱验证类型

弱验证类型与强验证类型不同,因为它会把内容相同的两份文件看做是一样的。例如,使用弱验证类型,一个页面与另外一个页面只是在页脚显示的时间上有所不同,或者是展示的广告不相同,那么就会被认为是相同的。但是在使用强验证的情况下,二者是不同的。构建应用于弱验证类型的标签(etag)体系可能会比较复杂,因为这会涉及到对页面上不同的元素的重要性进行排序,但是会对缓存性能优化相当有帮助。

条件首部

一些被称为条件首部的 HTTP 首部,可以引发条件请求。它们是:

If-Match
如果远端资源的实体标签与 ETag 个首部中列出的值相同的话,表示条件匹配成功。默认地,除非实体标签带有 'W/' 前缀,它将会执行强验证。
If-None-Match
如果远端资源的实体标签与在 ETag 这个首部中列出的值都不相同的话,表示条件匹配成功。默认地,除非实体标签带有 'W/' 前缀,它将会执行强验证。
If-Modified-Since
如果远端资源的 Last-Modified 首部标识的日期比在该首部中列出的值要更晚,表示条件匹配成功。
If-Unmodified-Since
如果远端资源的 HTTPHeader("Last-Modified")}} 首部标识的日期比在该首部中列出的值要更早或相同,表示条件匹配成功。
If-Range
If-Match  If-Unmodified-Since ,但是只能含有一个实体标签或者日期值。如果匹配失败,则条件请求宣告失败,此时将不会返回 206 Partial Content 响应码,而是返回 200 OK 响应码,以及完整的资源。

应用场景

缓存更新

条件式请求最常见的应用场景是更新缓存。假如缓存为空,或者是没有缓存的话,被请求资源会以状态码 200 OK 返回。

The request issued when the cache is empty triggers the resource to be downloaded, with both validator value sent as headers. The cache is then filled.

验证器会同资源一起返回,它们出现在首部字段中。在这个例子中, Last-ModifiedETag 都被返回,不过如果只返回其中的一个也是可以的。这些验证器会同资源一起被缓存起来(与所有的首部一样),并在在缓存失效的时候用来发起条件式请求。

只要缓存未失效,就不会发起任何请求。但是一旦失效——主要是由 Cache-Control 首部控制——客户端就不会采用缓存值而是发起条件式请求。验证器的值会用作 If-Modified-SinceIf-Match 首部字段的参数。

假如资源未发生变化,服务器就返回状态码为  304 Not Modified  的响应。这样相当于对缓存资源进行了刷新,而客户端则采用被缓存的资源。尽管这里有一次请求/响应往返会消耗一定的资源,但是这样做比将整个资源通过网络再传输一遍更高效。

With a stale cache, the conditional request is sent. The server can determine if the resource changed, and, as in this case, decide not to send it again as it is the same.

假如资源发生了变化,服务器就直接返回 200 OK 响应码,连同新版本的资源,就像是没有应用条件式请求一样;客户端则采用新版本资源(并将其缓存起来)。

In the case where the resource was changed, it is sent back as if the request wasn't conditional.

 除了需要在服务器端对验证器进行设置以外,该机制是透明的:所有的浏览器都会对缓存资源进行管理,在不需要 Web 开发者进行任何特殊处理的情况下发送条件式请求。

增量下载的完整性

文件的增量下载是 HTTP 协议规定的一项功能,它允许恢复先前的操作,通过保存先前已经获得的信息来节省带宽和时间:

A download has been stopped and only partial content has been retrieved.

支持增量下载的服务器会通过 Accept-Ranges 首部来广播这项能力。此后客户端就可以通过发送 Ranges 首部字段以及缺失的范围值来恢复下载了:

The client resumes the requests by indicating the range he needs and preconditions checking the validators of the partially obtained request.

基本原理很简单,但是这里有一个潜在的问题:如果要下载的资源在两次下载之间进行了修改,得到的数据范围就会对应该资源的两个不同的版本,那么最终获得的文件是损坏的。

为了防止这种情况的发生,需要使用条件式请求。对于范围请求来说,有两种方法可以实现这个目的。更灵活一些的方法是使用 If-Modified-SinceIf-Match 首部,假如前置条件失败,服务器端会返回错误提示,然后客户端可以从头开始重新下载资源:

When the partially downloaded resource has been modified, the preconditions will fail and the resource will have to be downloaded again completely.

尽管这种方法行得通,但是它在文件发生变化的情况下增加了一次额外的请求/响应往返。这一点会影响性能。为此 HTTP 协议规定了一个特定的首部—— If-Range ——来避免这种情况的发生:

The If-Range headers allows the server to directly send back the complete resource if it has been modified, no need to send a 412 error and wait for the client to re-initiate the download.

该方法更高效,但是缺乏一定的灵活性,因为条件值只能是标签。不过这种额外的灵活性很少会需要。

Avoiding the lost update problem with optimistic locking

A common operation in Web applications is to update a remote document. This is very common in any file system or source control applications, but any application that allows to store remote resources needs such a mechanism. Common Web sites, like wikis and other CMS, have such a need.

With the PUT method you are able to implement this. The client first reads the original files, modifies them, and finally pushes them to the server:

Updating a file with a PUT is very simple when concurrency is not involved.

Unfortunately, things get a little inaccurate as soon as we take into account concurrency. While a client is locally modifying its new copy of the resource, a second client can fetch the same resource and do the same on its copy. What happens next is very unfortunate: when they commit back to the server, the modifications from the first client are discarded by the next client push, as this second client is unaware of the first client's changes to the resource. The decision on who wins, is not communicated to the other party. Which client's changes are to be kept, will vary with the speed they commit; this depends on the performance of the clients, of the server, and even of the human editing the document at the client. The winner will change from one time to the next. This is a race condition and leads to problematic behaviors, which are difficult to detect and to debug:

When several clients update the same resource in parallel, we are facing a race condition: the slowest win, and the others don't even know they lost. Problematic!

There is no way in dealing with this problem, without annoying one of the two clients. However, lost updates and race conditions are to be avoided. We want predictable results, and expect that the clients are notified when their changes are rejected.

Conditional requests allow implementing the optimistic locking algorithm (used by most wikis or source control systems). The concept is to allow all clients to get copies of the resource, then let them modify it locally, controlling concurrency by successfully allowing the first client submitting an update. All subsequent updates, based on the now obsolete version of the resource, are rejected:

Conditional requests allow to implement optimistic locking: now the quickest wins, and the others get an error.

This is implemented using the If-Match or If-Unmodified-Since headers. If the etag doesn't match the original file, or if the file has been modified since it has been obtained, the change is simply rejected with a 412 Precondition Failed error. It is then up to the client to deal with the error: either by notifying the user to start again (this time on the newest version), or by showing the user a diff of both versions, helping them decide which changes they wish to keep.

Dealing with the first upload of a resource

The first upload of a resource is an edge case of the previous. Like any update of a resource, it is subject to a race condition if two clients try to perform at the similar times. To prevent this, conditional requests can be used: by adding If-None-Match with the special value of '*', representing any etag. The request will succeed, only if the resource didn't exist before:

Like for a regular upload, the first upload of a resource is subject to a race condition: If-None-Match can prevent it.

If-None-Match will only work with HTTP/1.1 (and later) compliant servers. If unsure if the server will be compliant, you need first to issue a HEAD request to the resource to check this.

Conclusion

Conditional requests are a key feature of HTTP, and allow the building of efficient and complex applications. For caching or resuming downloads, the only work required for webmasters is to configure the server correctly; setting correct etags in some environment can be tricky. Once achieved, the browser will serve the expected conditional requests.

For locking mechanisms, it is the opposite: Web developers need to issue a request with the proper headers, while webmasters can mostly rely on the application to carry out the checks for them.

In both cases it's clear, conditional requests are a fundamental feature behind the Web.

文档标签和贡献者

 此页面的贡献者: WayneCui
 最后编辑者: WayneCui,