Content negotiation

In the Hypertext Transfer Protocol (HTTP), content negotiation is the mechanism that is used, when facing the ability to serve several equivalent contents for a given URI, to provide the best suited one to the final user. The determination of the best suited content is made through one of three mechanisms:

Server-driven negotiation

In this kind of negotiation, the browser (or any other kind of agent) sends several HTTP headers along with the URI. These headers describe the preferred choice of the user. The server uses them as hints and an internal algorithm let it choose the best content to serve to the client. The algorithm is server-specific and not defined in the standard. See, for example, the Apache 2.2 negotiation algorithm.

The HTTP/1.1 standard gives an exhaustive list of the standard headers that may be used in a server-driven negotiation algorithm (Accept:, Accept-Charset:, Accept-Encoding:, Accept-Language:: and User-Agent:). Nevertheless it allows the server to use other aspects in its algorithm, either aspects outside the request itself or extension header fields, i.e., headers not defined in the HTTP/1.1 standard.

The Accept: header

Defined in the HTTP/1.1 Standard, section 14.1, the Accept: header lists the MIME Types of the media that the agent is willing to process. It is comma-separated lists of MIME type, each combined with a quality factor, as parameters giving the relative degree of preference between the different MIME Types lists.

The Accept: header is defined by the browser, or any other user-agent, and can vary according to the context. It is therefore different when fetching a document entered in the address bar or an element linked via an <img>, <video> or <audio> elements. Neither the HTTP standard, nor the HTML one, define specific MIME Type to use in specific contexts.

Default values

This is the value sent when the context doesn't gives better information. Note that all browsers add the */* MIME Type to cover all cases. This is typically used for requests initiated via the address bar of a browser, or via an HTML <a> element.

User Agent Value Comment
Firefox text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 This value can be modified using the network.http.accept.default parameter.
Safari, Chrome

application/xml,application/xhtml+xml,text/html;q=0.9, text/plain;q=0.8,image/png,*/*;q=0.5

source
Safari 5

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

This is an improvement over earlier Accept headers as it no longer ranks image/png above text/html
Internet Explorer 8 image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/msword, */* See IE and the Accept Header (IEInternals' MSDN blog).
Opera text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1  

Values for an image

When requesting an image, like through an HTML<img> element, user-agent often sets a specific list of media types to be welcomed.

User Agent Value Comment
Firefox image/png,image/*;q=0.8,*/*;q=0.5 This value can be modified using the image.http.accept parameter.
Safari, Chrome */*  
Internet Explorer 8 or earlier */* See IE and the Accept Header (IEInternals' MSDN blog)
Internet Explorer 9 image/png,image/svg+xml,image/*;q=0.8, */*;q=0.5 See Fiddler is better with Internet Explorer 9 (IEInternals' MSDN blog)

Values for a video

When a video is requested, via the <video> HTML element, most browsers use specific values.

User Agent Value Comment
Firefox earlier than 3.6 no support for <video>  
Firefox 3.6 and later audio/webm, audio/ogg, audio/wav, audio/*;q=0.9, application/ogg;q=0.7, video/*;q=0.6; */*;q=0.5 See bug 489071
Safari, Chrome ?  
Internet Explorer 8 or earlier no support for <video>  
Internet Explorer 9 ?  

Values for some audio

When an audio file is requested, like via the <audio> HTML element, most browsers use specific values.

User Agent Value Comment
Firefox 3.6 and later video/webm, video/ogg, video/*;q=0.9, application/ogg=0.7, audio/*;q=0.6; */*;q=0.5 See bug 489071
Safari, Chrome ?  
Internet Explorer 8 or earlier no support for <audio>  
Internet Explorer 9    

Values for  scripts

When a JS script is requested, like via the <script> HTML element, most browsers use specific values.

User Agent Value Comment
Firefox */* See bug 170789
Safari, Chrome ?  
Internet Explorer 8 or earlier */* See IE and the Accept Header (IEInternals' MSDN blog)
Internet Explorer 9 application/javascript, */*;q=0.8 See Fiddler is better with Internet Explorer 9 (IEInternals' MSDN blog)

Values for a stylesheet

When a CSS stylesheet is requested, via the <link rel="stylesheet"> HTML element, most browsers use specific values.

User Agent Value Comment
Firefox 4 text/css,*/*;q=0.1 See bug 170789
Safari 5 text/css,*/*;q=0.1  
Internet Explorer 8 or earlier */* See IE and the Accept Header (IEInternals' MSDN blog)
Internet Explorer 9 text/css See Fiddler is better with Internet Explorer 9 (IEInternals' MSDN blog)
Chrome 12 text/css,*/*;q=0.1  
Opera 11.10 text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1  
Konqueror 4.6 text/css,*/*;q=0.1  

The Accept-Charset: header

Defined in the HTTP/1.1 standard, section 14.2, this header indicates to the server what character encodings are understood by the user-agent. Traditionally, it was set to a different value for each locale for the browser, e. g. ISO-8859-1,utf-8;q=0.7,*;q=0.7 for a Western European locale.

Considering that :
  • UTF-8 is now well-supported by all relevant user-agents,
  • the presence of the header increases the  configuration-based entropy exposed,
  • the presence of the header increases the data transmitted for each request
  • almost no sites are using the value of this header for choosing content during the negotiation,
browsers started to stop sending this header in each request, starting with Internet Explorer 8, Safari 5, Opera 11 and Firefox 10. In the absence of Accept-Charset:, servers can simply assume that UTF-8 and the most common characters sets are understood by the client.

The Accept-Encoding: header

Defined in the HTTP/1.1 standard, section 14.3, this header defines the acceptable content-encoding, mainly supported encryption. The HTTP/1.1 standard defines the following values :

Value Meaning Standard
gzip

A format using the Lempel-Ziv coding (LZ77), with a 32-bit CRC. This is originally the format of the UNIX gzip program. The HTTP/1.1 standard also recommends that the servers supporting this content-encoding should recognize x-gzip as an alias, for compatibility purposes.

RFC 1952
compress A format using the Lempel-Ziv-Welch (LZW) algorithm. The value name was taken from the UNIX compress program, which implemented this algorithm.
Like the compress program, which has disappeared from most UNIX distributions, this content-encoding is used by almost no browsers today, partly because of a patent issue (which expired in 2003).
HTTP/1.1
deflate Using the zlib structure (defined in RFC 1950), with the deflate compression algorithm (defined in RFC 1951). RFC 1950 and RFC 1951
identity Indicates the identity function (i.e. no compression, nor modification). This token, except if explicitly specified, is always deemed acceptable. HTTP/1.1
* This wildcard represents any content-encoding not explicitly specified in the header HTTP/1.1

Notes :

  • An IANA registry maintains a complete list of official content encodings. Non-standard ones can be used, but must be prefixed with the x- prefix.
  • Two others content encoding, bzip and bzip2, are sometimes used, though not standard. They implement the algorithm used by these two UNIX programs. Note that the first one was discontinued due to patent licensing problems.
  • As long as the identity value is not explicitly forbidden, by an identity;q=0 or a *;q=0 without another explicitly set value for identity, the server must never send back a 406 Not Acceptable error.
  • Even if both the client and the server supports the same compression algorithms, the server may choose not to compress the body of a response, if the identity value is also acceptable. Two common cases lead to this:
    • The data to be sent is already compressed and a second compression won't lead to smaller data to be transmitted. This may the case with some image formats;
    • The server is overloaded and cannot afford the computational overhead induced by the compression requirement. Typically, Microsoft recommends not to compress if a server use more than 80 % of its computational power.

The Accept-Language: header

Defined in the HTTP/1.1 standard, section 14.4, this header is used to indicate the language preference of the user. A different value is set according the language of the graphical interface but most browsers allow setting different language preferences.

In this header there is a language quality factor. From w3.org:

Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range. The quality value defaults to "q=1". For example,

       Accept-Language: da, en-gb;q=0.8, en;q=0.7

 

 

Usage notes:
  • This header, especially when user-modified, greatly increases the configuration-based entropy and may be used in HTTP fingerprinting of the user.
  • Site-designers must not be over-zealous by using language detection via this header as it can lead to a poor user experience:
    • They should always provide a way to overcome the server-chosen language, e.g., by providing small links near the top of the page. Most user-agents provide a default value for the Accept-Language: header, adapted to the user interface language and end users often do not modify it, either by not knowing how, or by not being able to do it, as in an Internet café for instance.
    • Once a user has overridden the server-chosen language, a site should no longer use language detection and should stick with the explicitly-chosen language.. In other words, only entry pages of a site should select the proper language using this header.

The User-Agent: header

Defined in the HTTP/1.1 standard, section 14.43, this header identifies the browser sending the request. This string may contain a space-separated list of product tokens and comments.

A product token is a name followed by a '/' and a version number, like Firefox/4.0.1. There may be as many of them as the user-agent wants. A comment is a free string delimited by parentheses. Obviously parentheses cannot be used in that string. The inner format of a comment is not defined by the standard, though several browser put several tokens in it, separated by ';'.

Usage Notes :
  • Though there are legitimate uses of this header for selecting content, it is considered bad practice to rely on it to define what features are supported by the user agent. Instead try to use in priority feature-oriented object detection.
  • Consider the User-Agent: header as a hint only. It may be altered by third-party tools or by the user. If serving tailored content according this header, always provide a way to manually switch to the alternative content.
  • Do not expect the product tokens to be served in a specific order or the format or the comments to be fixed; always parse it first by comment and product token, then product tokens by product name and version number. Always take in account that the format of a comment may vary in the future by providing an adequate fallback case.
  • A website should not send a 406 Not Acceptable error codes based on the user agent string. It is better to send less suited content than no content at all (See W3C Blog).
  • This article describes the current Gecko user-agent strings.

The Vary: response header

In opposition with the previous Accept-*: headers which are sent by the client, the Vary: HTTP header is sent by the web server in its response. It indicates the list of headers used by the server during the server-driven content negotiation phase. The header is needed in order to inform the cache of the decision criteria so that can reproduce it, allowing the cache to be functional while preventing serving erroneous content to the user.

The special value of '*' means that the server-driven content negotiation also uses information not conveyed in a header to choose the appropriate content.

The Vary: header was added in the version 1.1 of HTTP and is necessary in order to allow caches to work appropriately. A cache, in order to work with agent-driven content negotiation, needs to know which criteria was used by the server to select the transmitted content. That way, the cache can replay the algorithm and will be able to serve acceptable content directly, without more request to the server. Obviously, the wildcard '*' prevents caching from occurring, as the cache cannot know what element is behind it.

Agent-driven negotiation

Server-driven negotiation suffers from a few downsides:

  • It doesn't scale well. There is one header per feature used in the negotiation. If one wants to use screen size, resolution or other dimensions, a new HTTP header must be created.
  • Sending of the headers must be done on every request. This is not too problematic with few headers, but with the eventual multiplications of them, the message size would lead to a decrease in performance.
  • The more headers are sent, the more entropy is sent, allowing for better HTTP fingerprinting and corresponding privacy concern.

HTTP allowed from the start another negotiation type, agent-driven negotiation. In this negotiation, when facing a ambiguous request, the server sends back a page containing links to the available alternative resources. The user is presented the resources and choose the one to use.

Unfortunately, the HTTP standard does not specify the format of the page allowing to choose between the available resource, preventing to easily automatize the process. Beside fallback of the server-driven negotiation, this method is almost always used in conjunction with scripting, especially with JavaScript redirection: after having checked for the negotiation criteria, the script performs the redirection.

A second problem is that one more request is needed in order to fetch the real resource, slowing the availability of the resource to the user.

Also note that the caching of the resource is trivial, as each resource has a different URI.

Transparent content negotiation

Both agent and server-driven content negotiations had some drawbacks. In order to get the best of both world, the HTTP/1.1 standard allowed then another type of content negotiation, transparent content negotiation, but didn't specify it, letting it for subsequent standards. Very early a proposal, RFC 2295 Transparent Content negotiation in HTTP, was proposed as an experiment. This proposal added several headers so that, when the server isn't able, using the server-driven content negotiation mechanism, to decide which resource to send back, it will send back the most likely resource, completed with HTTP headers listing all relevant resources in a machine-readable format.

A transparent content negotiation is initiated by the agent by adding a Negotiate: header. This header tells the server that the client is supporting this type of negotiation. In its response, the server will gives back the result of its server-driven content negotiation, i.e. the most-likely page according the Accept-*: headers sent by the agent in its request, and several HTTP headers (TCN:, Alternates: and Variant-Vary:) indicating what alternative pages does exists. If the browser find the alternatives more useful than the default page, it performs the second part a client-driven content negotiation and fetch the better page. That way, each time the server-driven content negotiation succeed only one HTTP request is needed, which is optimal. A second request is needed only when the Accept-*: header do not convey enough information to let the server do an optimal choice.

When a server receives a request containing a Negotiation: HTTP header, it sends backs one of the three kind of responses:

  • a list response, often coupled with a 300 Multiple Choices status code, which lists the available variants without sending one. This is very similar to the agent-driven content negotiation though the variants are sent via the Alternates: HTTP headers in a machine-readable form.
  • a choice response, which may also sent an Alternates: HTTP headers containing all the available variants, but also directly send in the message body the most adequate page, according the server knowledge. This is the most interesting case as it removes the need of a second HTTP request to fetch the real resource, when the guess is correct.
  • an ad-hoc response, which is an fallback measure, allowing to work-around buggy client or servers.

The Negotiate: header

This header is sent by the agent and indicates that it supports transparent content negotiation. It also conveys some information about the kind of answers it wants, and the formats it supports in the response headers. It may have one or several of the following values :

Value Meaning Example
trans The user-agent does support transparent content negotiation for this request. Negotiate: trans
vlist
(implies trans)
The user-agent wants that the response, if the server supports transparent content negotiation, contains an Alternates: header listing all the pages matching the request. Negotiate: vlist

guess-small
(implies trans and vlist)

   
a rvsa version
(implies trans)
   

*
(implies trans)

 

 

The Accept-Features: header

The TCN: header

The Alternates: header

The Variant-Vary: header:

The 506 Variant Also Negotiates error code

Acceptance

References

Document Tags and Contributors

Contributors to this page: teoli
Last updated by: teoli,