Hypertext Transfer Protocol (HTTP) is an application-layer protocol for transmitting hypermedia documents, such as HTML. It was designed for communication between web browsers and web servers, though it can be used for other purposes as well. It follows a classical client-server model, with a client opening a connection, making a request, and waiting until it receives a response. It is also a stateless protocol, meaning that the server does not keep any data (state) between two requests.
Though often based on a TCP/IP layer, it can be used on any connection-oriented transport layer.
- HTTP Headers
- HTTP message headers are used to describe precisely the resource or the behavior of the server or the client. Custom proprietary headers can be added using the 'X-' prefix; others in an IANA registry, whose original content was defined in RFC 4229. IANA also maintains a registry of proposed new HTTP message headers.
- HTTP cookies
- How cookies work is defined by RFC 6265. On receiving an HTTP request, a server can send a
Set-Cookieheader with the response. Afterward, the client returns the cookie value with every request to the same server in the form of a
CookieHTTP header. Additionally, an expiration delay can be specified. The cookie can also be restricted to a specific domain and path.
- Basic access authentication
- Considering HTTP transaction, basic access authentication is a method by which an HTTP user agent provides a username and password to the user, once a request has been made.
- HTTP Pipelining FAQ
- HTTP/1.1 introduced a well-intentioned but ultimately treacherous feature called HTTP Pipelining. This FAQ addresses common questions and issues about its use.
- HTTP access control (CORS)
- Cross-site HTTP requests are HTTP requests for resources from a different domain other than the domain of the resource making the request. For instance, a resource loaded from Domain A (
http://domaina.example/), such as an HTML web page, makes a request for a resource on Domain B (http://domainb.foo/), such as an image, using the
http://domainb.foo/image.jpg). This occurs very commonly on the web today — pages load many resources in a cross-site manner, including CSS stylesheets, images and scripts, and other resources.
- Controlling DNS prefetching
- HTTP response codes
- HTTP Response Codes indicate whether a specific HTTP request has been successfully completed. Responses are grouped in five classes: informational responses, successful responses, redirections, client errors, and servers errors.
A brief history of HTTP
HTTP's original conception was a protocol with a single method (GET) and returning only HTML pages, and has since undergone several revisions. The first documented version was HTTP/0.9 in 1991, corresponding to the original version. Very simple, it has a rudimentary search capability via the HTML
<isindex> element and an extension of the URL using the '
In 1992, a version was published that later became, with minor changes, HTTP/1.0 (finalized with RFC 1945 in May 1996). One major improvement was the ability to transmit files of different types, like images, videos, scripts, stylesheets, instead of only HTML files, by using MIME types in conjunction with the
In 1995, the IETF began developing a new version of HTTP, which became HTTP/1.1. It quickly spread into wide usage, and was officially standardized in 1997 with RFC 2068, and received minor fixes in RFC 2616 two years later.
Connection: keep-alive, which allowed reuse of established TCP connections for subsequent requests with, greatly improving performance by lowering the latency between them. This is especially useful with complex HTML documents that fetch multiple resources, like images and stylesheets. It also brought the
Host: header, which allows a single server, listening on a specific port, to serve requests for different websites; this paved the way for collocating numerous websites on one server, greatly reducing the cost of hosting.
Since then, the HTTP protocol evolved by adding new headers, defining new behaviors without the need to fundamentally change the protocol. Unknown headers are simply ignored by servers or clients.
HTTP/1.1 is currently being revised by the IETF HTTPbis Working Group, and its successor HTTP/2 is currently being deployed worldwide.
HTTP request methods
The request method indicates the action to be performed by the server. The HTTP/1.1 standard defines 7 methods and allows others to be added later, such as WebDAV. The IETF HTTPbis Working Group is currently working on an IANA registry to list them all. If a server receives a request method that it does not know, it must return a
501 Not implemented response. If it knows the method but is configured not to answer it, it must return a
405 Method not allowed response. Only the
GET methods are required to be supported; all others are optional.
Two specific semantics are defined in the standard and are crucial for web developers: the safety property and the idempotent property.
A safe method is a method that doesn't have any side-effects on the server. In other words, this property means that the method must be used only for retrieval of data. The safe HTTP methods defined in HTTP/1.1 are:
- GET, used to retrieve information identified by the request URI. This information may be generated on the fly or the GET may be conditional if any of the
If-RangeHTTP headers are set. In that latter case, the information is only sent back if all the conditions are fulfilled.
- HEAD, which is identical to GET but without the message body sent.
- Any safe method is also idempotent.
- Not having any side-effects means, for the GET method, that it must not be used to trigger an action outside the server, like an order in an e-commerce site. If a side-effect is wanted, a non-idempotent method should be used, like POST.
- When a page is generated on the fly by a script, the script engine may calculate the page as if it was requested by a GET and then strip the data block. This does not cause a problem as long as the GET as implemented in the script is safe, but if it has any side-effects (like triggering an order on an e-commerce site), the HEAD method will trigger it too. It is up to the web developer to ensure that both the GET and HEAD method are safely implemented.
An idempotent method is a method such that the side-effects on the server of several identical requests with the method are the same as the side-effects of one single request.
- HEAD and GET, like any safe method, are also idempotent, as a safe method does not have side-effects on the server.
- PUT uploads a new resource on the server. If the resource already exists and is different, it is replaced; if it does not exist, it is created.
- DELETE removes a resource from the server.
- POST triggers an action on the server. It has side-effects and can be used to trigger an order, modify a database, post a message on a forum, or other actions.
- OPTIONS is a request for communication options available on the chain between the client and the server (through any proxies); this method is typically sent before any preflighted cross-origin request, to know whether it is safe to do it.
Note: Preflighted cross-origin requests cannot be made on servers that don't allow or support the OPTIONS method.
- TRACE is a kind of ping between the client and the server (through any proxies).
Many more methods, such as PROPFIND or PATCH are defined in other standards-track RFCs of the IETF, like WebDAV.
The CONNECT method is defined in RFC 2817.
HTTP Requests Methods in HTML Forms
In HTML, different HTTP request methods can be specified in the
method attribute of the
<form> element and the
<button> elements. Only GET and POST methods are standardized to be used in these attributes by the HTML specification. See this StackExchange answer why other HTTP request methods are not allowed.
HTTP response codes
When answering a client request, the server sends back a three-digit number indicating whether the request was successfully processed. These codes can be grouped into five categories:
- Informational responses (of the form
1xx) are provisional responses. Most of the time neither the end user nor the web developer or webmaster should have to bother with these. The most common is the
100 Continueresponse, indicating that the client should continue with its request.Note: No informational response codes were defined in the HTTP/1.0, and, therefore, they must not be sent back when this version of the protocol is used.
- Success responses (of the form
2xx) are for successfully processed requests.
200 OKresponse is by far the most common success response, but
206 Partial Contentis also seen when fetching a file or media like video or audio.
- Redirection responses (of the form
3xx) indicate that the resource that the client requested has moved, and the server cannot serve it directly. Most redirection responses contain location information describing where to find the requested resource; user-agents can then retrieve it without further user interaction. The most common redirection responses are
301 Moved Permanently, indicating that the URI is no longer valid, and the requested resource has been moved elsewhere, and
302 Found, which indicates that the resource has been temporarily moved to another place.Note: For webmasters, it is recommended to set a
301 Moved Permanentlyredirection when moving pages to another address. This allows old links to still reach the resource, and teaches search engines and other programs the new location, so they can transfer their metadata to it. It is also important to add cache headers to the
301 Moved Permanentlyresponse, so the redirect is cached by the client to prevent it from making unnecessary requests to the original URI.
- Client error responses (of the form
4xx) indicate that the request sent by the client is either invalid, incomplete, or doesn't have enough rights to be performed. The most common such response is 404 Not Found which is sent back when the URI requested doesn't exist. A few other responses are often presented to the end user, like 400 Bad Request, sent when the request is not a valid HTTP request (as this should not happen but may indicate a bug into the user agent or, less likely, the server). Also 403 Forbidden, which is sent when the client request a resource that does exist but isn't allowed to be transmitted (like a directory content).
- Server error responses (of the form
5xx) indicate that the server had a problem handling the valid client request. The two most common such responses are 500 Internal Server Error, a generic error code indicating a bug in the server or 503 Service Unavailable indicating that the server cannot process the request due to a temporary problem, like a disabled service for maintenance purposes or the non-availability of a database.
More on redirection responses
In Firefox, redirections (such as 301 and 307) that specify a
HTTP headers allow the client and the server to pass additional information with the request or the response. A request header consists of its case-insensitive name followed by a colon '
:', then by its value (without line breaks). Leading white space before the value is ignored.
Headers are grouped according to their contexts:
- General headers
- These headers apply to both requests and responses but are unrelated to the data eventually transmitted in the body. They, therefore, apply only to the message being transmitted. There are only a few of them, and new ones cannot be added without increasing the version number of the HTTP protocol. The exhaustive list for HTTP/1.1 is
- Request headers
- These headers give more precise information about the resource to be fetched or about the client itself. Among them one can find cache-related headers, transforming a GET method in a conditional GET, like
If-Modified-Since, user-preference information like
Accept-Charsetor plain client information like
User-Agent. New request headers cannot officially be added without increasing the version number of the HTTP protocol. However, it is common for new request headers to be added if both the server and the client agree on their meaning. In that case, a client should not assume that they will be handled adequately by the server; unknown request headers are handled as entity headers.
- Response headers
- These headers give additional information about the response, like its real location (
Location), or about the server itself, like its name and version (
Server). New response headers cannot be officially added without increasing the version number of the HTTP protocol, but new response headers can be used if both the server and the client agree on their meaning. In that case, a server should not assume that they will be handled adequately by the client; unknown response headers are handled as entity headers.
- Entity headers
- These headers give more information about the body of the entity, like its length (
Content-Length), an identifying hash (
Content-MD5), or its MIME-type (
Content-Type). New entity headers can be added without increasing the version number of the HTTP protocol.
Headers can also be grouped according to how proxies handle them:
- End-to-end headers
- These headers must be transmitted to the final recipient of the message; that is, the server for a request or the client for a response. Intermediate proxies must retransmit end-to-end headers unmodified and caches must store them.
- Hop-by-hop headers
- These headers are meaningful only for a single transport-level connection and must not be retransmitted by proxies or cached. Such headers are:
Upgrade. Note that only hop-by-hop headers may be set using the
To learn about the specific semantic of each header, see its entry in the comprehensive list of HTTP headers.
Useful request headers
Among the numerous HTTP request headers, several are especially useful. If you are building your requests by using
XMLHTTPRequest, or writing an extension to send custom HTTP requests via XPCOM, then it is important to ensure the presence of headers that are often set by browsers based on the preferences of the user.
- Controlling the language of the resource
- Most user-agents, like Firefox, allow the user to set a preference for the language for received resources. Browsers translate this into an
Accept-Languageheader. It is good practice for web developers to include this header when building HTTP requests.
- Using conditional GET
- Caching is a major tool to accelerate the display of web pages. Even when parts of a webpage are refreshed via an
XMLHTTPRequest, it is a good idea to use the
If-Modified-Sinceheader (and other similar ones) to fetch the new content only if it has changed. This approach lowers the burden on the network.
Useful response headers
The configuration of a web server is critical to ensure performance and security of a web site. Several headers should be used correctly to this end.
Cross-site scripting (XSS) attacks exploit the ability to put third-party content inside an
<iframe>. To mitigate that risk, modern browsers have introduced the
CSP frame-ancestors directive. By setting it to the value
'none', it prevents browsers from displaying this resource inside of a frame. Using it on critical resources (like those containing financial or private information) will reduce the risk of XSS attacks. Note that this specific HTTP response header is not the only way to mitigate XSS; other techniques, like setting Content Security Policies, may be helpful too.
Minimizing the amount of data transferred accelerates the display of a web page. Though most techniques, like CSS Sprites, should be applied on the site itself, compression of data must be set at the web server level. If set, resources requested by the client with an
Accept-Encoding request header are compressed using the appropriate method and sent back with a
Content-Encoding response header. These can be set in Apache 2 servers with the mod_deflate module.
HTTP Caching is a technique that prevents the same resource from being fetched several times if it has not changed. Configuring the server with the correct response headers allows the user-agent to cache the data adequately. To do that, be sure that:
- Any static resource provides an
Expiresresponse header that is set to far in the future. That way, the resource may stay in the cache until the user-agent flushes it for its own reasons (like reaching its cache size limit).Note: On Apache, use the ExpiresDefault directive in your .htaccess to define a relative expires:
ExpiresDefault "access plus 1 month".
- Any dynamic resource provides a
Cache-controlresponse header. Theoretically, any HTTP request done through a safe method (GET or HEAD) or even through a solely idempotent one (DELETE, PUT) may be cached; but in practice, careful study is needed to determine if the caching of the response may lead to inappropriate side-effects.
Setting the correct MIME types
The MIME type is the mechanism to tell the client the kind of document transmitted: the extension of a file name has no meaning on the web. It is, therefore, important that the server is correctly set up so that the correct MIME type is transmitted with each document: user-agents often use this MIME-type to determine what default action to do when a resource is fetched.
- On Apache, one can match file extensions with a given MIME type in .htaccess using the
AddTypetype directive like
AddType image/jpeg jpg.
- Most web servers send unknown-type resources using the default
application/octet-streamMIME type; for security reasons, most browsers, like Firefox, do not allow setting a custom default action for such resources and force the user to store it to disk to use it. Some common cases of often incorrectly configured servers happens for the following file types:
RAR-encoded files. The ideal would be to be able to set the real type of the encoded files; this often is not possible (as it may not be known to the server and these files may contain several resources of different types). In that case, configure the server to send the
application/x-rar-compressedMIME type or users will not be able to define a useful default action for them.
Proprietary file types. Pay special attention when serving a proprietary file type. Be sure not to forget to add an x-prefixed type for it; otherwise, special handling will not be possible. This is particularly the case with resources using the Keyhole Markup Language, which should be served as