Una panoramica su HTTP

Questa traduzione è incompleta. Aiutaci a tradurre questo articolo dall’inglese

HTTP è un protocollo che permette il recupero di risorse, come le pagine HTML. È il fondamento di ogni scambio di dati sul Web ed è un protocollo client-server,il che significa che le richieste vengono avviate dal destinatario, solitamente il browser Web. Un documento completo viene ricostruito dai diversi sotto-documenti recuperati, ad esempio testo, descrizione del layout, immagini, video, script e altro.

A Web document is the composition of different resources

Client e server comunicano scambiando messaggi individuali (al contrario di un flusso di dati). I messaggi inviati dal client, solitamente un browser Web, sono chiamati richieste (requests) e i messaggi inviati dal server come risposta sono chiamati risposte (responses).

HTTP as an application layer protocol, on top of TCP (transport layer) and IP (network layer) and below the presentation layer.Progettato all'inizio degli anni '90 (1990), HTTP è un protocollo estensibile che si è evoluto nel tempo. È un protocollo a livello di applicazione che viene inviato mediante TCP, o mediante una connessione TCP cifrata TLS, anche se teoricamente potrebbe essere utilizzato qualsiasi protocollo di trasporto affidabile. Grazie alla sua estensibilità, viene utilizzato non solo per recuperare documenti ipertestuali, ma anche immagini e video o per pubblicare contenuti su server, come con i risultati dei moduli HTML. HTTP può essere utilizzato anche per recuperare parti di documenti per aggiornare pagine Web su richiesta.

Componenti di sistemi basati su HTTP

HTTP è un protocollo client-server: le richieste vengono inviate da un'entità, lo user-agent (o un proxy per conto di esso). Il più delle volte lo user-agent è un browser Web, ma può essere qualsiasi cosa, ad esempio un robot che esegue la scansione del Web per popolare e mantenere un indice del motore di ricerca.

Ogni singola richiesta viene inviata a un server, che la gestisce e fornisce una risposta, chiamata response. Tra il client e il server ci sono numerose entità, chiamate collettivamente proxies, che eseguono operazioni diverse e fungono da gateway o caches, per esempio.

Client server chain

In realtà, ci sono più computer tra un browser e il server che gestisce la richiesta: ci sono router, modem e altro. Grazie alla struttura a strati del Web, questi sono nascosti nella rete e nei livelli di trasporto. L'HTTP è in cima, a livello applicazione. Sebbene importanti per diagnosticare i problemi di rete, i livelli sottostanti sono per lo più irrilevanti per la descrizione di HTTP.

Client: lo user-agent

Lo user-agent è qualsiasi strumento che agisce per conto dell'utente. Questo ruolo viene svolto principalmente dal browser Web; altre possibilità sono i programmi utilizzati da ingegneri e sviluppatori Web per eseguire il debug delle loro applicazioni.

Il browser è sempre l'entità che avvia la richiesta. Non è mai il server (sebbene nel corso degli anni siano stati aggiunti alcuni meccanismi per simulare i messaggi avviati dal server).

Per presentare una pagina Web, il browser invia una richiesta iniziale per recuperare il documento HTML che rappresenta la pagina. Quindi analizza questo file, effettuando richieste aggiuntive corrispondenti a script di esecuzione, informazioni sul layout (CSS) da visualizzare e sotto-risorse contenute all'interno della pagina (solitamente immagini e video). Il browser Web quindi compone queste risorse per presentare all'utente un documento completo, la pagina Web. Gli script eseguiti dal browser possono recuperare più risorse nelle fasi successive e il browser aggiorna la pagina Web di conseguenza.

Una pagina Web è un documento ipertestuale. Ciò significa che alcune parti del testo visualizzato sono collegamenti che possono essere attivati (di solito con un clic del mouse) per recuperare una nuova pagina Web, consentendo all'utente di dirigere il proprio user-agent e navigare nel Web. Il browser traduce queste indicazioni in richieste HTTP e interpreta ulteriormente le risposte HTTP per presentare all'utente una risposta chiara.

Il Web server

Sul lato opposto del canale di comunicazione, è il server, che serve (restituisce) il documento come richiesto dal client. Un server appare virtualmente come un'unica macchina: questo perché potrebbe essere un insieme di server, i quali condividono il carico (bilanciamento del carico) o un complesso software che interroga altri computer (come una cache, un DB server o un e-commerce server), generando totalmente o parzialmente il documento su richiesta.

Un server non è necessariamente una singola macchina, ma più istanze di software server possono essere ospitate sulla stessa macchina. Con HTTP / 1.1 e l'intestazione Host, possono persino condividere lo stesso indirizzo IP. HERE!!!

Proxies

Between the Web browser and the server, numerous computers and machines relay the HTTP messages. Due to the layered structure of the Web stack, most of these operate at the transport, network or physical levels, becoming transparent at the HTTP layer and potentially making a significant impact on performance. Those operating at the application layers are generally called proxies. These can be transparent, forwarding on the requests they receive without altering them in any way, or non-transparent, in which case they will change the request in some way before passing it along to the server. Proxies may perform numerous functions:

  • caching (the cache can be public or private, like the browser cache)
  • filtering (like an antivirus scan or parental controls)
  • load balancing (to allow multiple servers to serve the different requests)
  • authentication (to control access to different resources)
  • logging (allowing the storage of historical information)

Basic aspects of HTTP

HTTP is simple

HTTP is generally designed to be simple and human readable, even with the added complexity introduced in HTTP/2 by encapsulating HTTP messages into frames. HTTP messages can be read and understood by humans, providing easier testing for developers, and reduced complexity for newcomers.

HTTP is extensible

Introduced in HTTP/1.0, HTTP headers make this protocol easy to extend and experiment with. New functionality can even be introduced by a simple agreement between a client and a server about a new header's semantics.

HTTP is stateless, but not sessionless

HTTP is stateless: there is no link between two requests being successively carried out on the same connection. This immediately has the prospect of being problematic for users attempting to interact with certain pages coherently, for example, using e-commerce shopping baskets. But while the core of HTTP itself is stateless, HTTP cookies allow the use of stateful sessions. Using header extensibility, HTTP Cookies are added to the workflow, allowing session creation on each HTTP request to share the same context, or the same state.

HTTP and connections

A connection is controlled at the transport layer, and therefore fundamentally out of scope for HTTP. Though HTTP doesn't require the underlying transport protocol to be connection-based; only requiring it to be reliable, or not lose messages (so at minimum presenting an error). Among the two most common transport protocols on the Internet, TCP is reliable and UDP isn't. HTTP therefore relies on the TCP standard, which is connection-based.

Before a client and server can exchange an HTTP request/response pair, they must establish a TCP connection, a process which requires several round-trips. The default behavior of HTTP/1.0 is to open a separate TCP connection for each HTTP request/response pair. This is less efficient than sharing a single TCP connection when multiple requests are sent in close succession.

In order to mitigate this flaw, HTTP/1.1 introduced pipelining (which proved difficult to implement) and persistent connections: the underlying TCP connection can be partially controlled using the Connection header. HTTP/2 went a step further by multiplexing messages over a single connection, helping keep the connection warm and more efficient.

Experiments are in progress to design a better transport protocol more suited to HTTP. For example, Google is experimenting with QUIC which builds on UDP to provide a more reliable and efficient transport protocol.

What can be controlled by HTTP

This extensible nature of HTTP has, over time, allowed for more control and functionality of the Web. Cache or authentication methods were functions handled early in HTTP history. The ability to relax the origin constraint, by contrast, has only been added in the 2010s.

Here is a list of common features controllable with HTTP.

  • Caching
    How documents are cached can be controlled by HTTP. The server can instruct proxies and clients, about what to cache and for how long. The client can instruct intermediate cache proxies to ignore the stored document.
  • Relaxing the origin constraint
    To prevent snooping and other privacy invasions, Web browsers enforce strict separation between Web sites. Only pages from the same origin can access all the information of a Web page. Though such constraint is a burden to the server, HTTP headers can relax this strict separation on the server side, allowing a document to become a patchwork of information sourced from different domains; there could even be security-related reasons to do so.
  • Authentication
    Some pages may be protected so that only specific users can access them. Basic authentication may be provided by HTTP, either using the WWW-Authenticate and similar headers, or by setting a specific session using HTTP cookies.
  • Proxy and tunneling
    Servers or clients are often located on intranets and hide their true IP address from other computers. HTTP requests then go through proxies to cross this network barrier. Not all proxies are HTTP proxies. The SOCKS protocol, for example, operates at a lower level. Other protocols, like ftp, can be handled by these proxies.
  • Sessions
    Using HTTP cookies allows you to link requests with the state of the server. This creates sessions, despite basic HTTP being a state-less protocol. This is useful not only for e-commerce shopping baskets, but also for any site allowing user configuration of the output.

HTTP flow

When a client wants to communicate with a server, either the final server or an intermediate proxy, it performs the following steps:

  1. Open a TCP connection: The TCP connection is used to send a request, or several, and receive an answer. The client may open a new connection, reuse an existing connection, or open several TCP connections to the servers.
  2. Send an HTTP message: HTTP messages (before HTTP/2) are human-readable. With HTTP/2, these simple messages are encapsulated in frames, making them impossible to read directly, but the principle remains the same. For example:
    GET / HTTP/1.1
    Host: developer.mozilla.org
    Accept-Language: fr
  3. Read the response sent by the server, such as:
    HTTP/1.1 200 OK
    Date: Sat, 09 Oct 2010 14:28:02 GMT
    Server: Apache
    Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT
    ETag: "51142bc1-7449-479b075b2891b"
    Accept-Ranges: bytes
    Content-Length: 29769
    Content-Type: text/html
    
    <!DOCTYPE html... (here comes the 29769 bytes of the requested web page)
  4. Close or reuse the connection for further requests.

If HTTP pipelining is activated, several requests can be sent without waiting for the first response to be fully received. HTTP pipelining has proven difficult to implement in existing networks, where old pieces of software coexist with modern versions. HTTP pipelining has been superseded in HTTP/2 with more robust multiplexing requests within a frame.

HTTP Messages

HTTP messages, as defined in HTTP/1.1 and earlier, are human-readable. In HTTP/2, these messages are embedded into a binary structure, a frame, allowing optimizations like compression of headers and multiplexing. Even if only part of the original HTTP message is sent in this version of HTTP, the semantics of each message is unchanged and the client reconstitutes (virtually) the original HTTP/1.1 request. It is therefore useful to comprehend HTTP/2 messages in the HTTP/1.1 format.

There are two types of HTTP messages, requests and responses, each with its own format.

Requests

An example HTTP request:

A basic HTTP request

Requests consists of the following elements:

  • An HTTP method, usually a verb like GET, POST or a noun like OPTIONS or HEAD that defines the operation the client wants to perform. Typically, a client wants to fetch a resource (using GET) or post the value of an HTML form (using POST), though more operations may be needed in other cases.
  • The path of the resource to fetch; the URL of the resource stripped from elements that are obvious from the context, for example without the protocol (http://), the domain (here, developer.mozilla.org), or the TCP port (here, 80).
  • The version of the HTTP protocol.
  • Optional headers that convey additional information for the servers.
  • Or a body, for some methods like POST, similar to those in responses, which contain the resource sent.

Responses

An example response:

Responses consist of the following elements:

  • The version of the HTTP protocol they follow.
  • A status code, indicating if the request was successful, or not, and why.
  • A status message, a non-authoritative short description of the status code.
  • HTTP headers, like those for requests.
  • Optionally, a body containing the fetched resource.

APIs based on HTTP

The most commonly used API based on HTTP is the XMLHttpRequest API, which can be used to exchange data between a user agent and a server. The modern Fetch API provides the same features with a more powerful and flexible feature set.

Another API, server-sent events, is a one-way service that allows a server to send events to the client, using HTTP as a transport mechanism. Using the EventSource interface, the client opens a connection and establishes event handlers. The client browser automatically converts the messages that arrive on the HTTP stream into appropriate Event objects, delivering them to the event handlers that have been registered for the events' type if known, or to the onmessage event handler if no type-specific event handler was established.

Conclusion

HTTP è un protocollo estensibile che è facile da usare. The client-server structure, combined with the ability to simply add headers, allows HTTP to advance along with the extended capabilities of the Web.

Though HTTP/2 adds some complexity, by embedding HTTP messages in frames to improve performance, the basic structure of messages has stayed the same since HTTP/1.0. Session flow remains simple, allowing it to be investigated, and debugged with a simple HTTP message monitor.