这篇翻译不完整。请帮忙从英语翻译这篇文章

WebSocket服务器是一个TCP应用程序,它监听遵循特定协议的服务器的任何端口,因此很简单。创建定制服务器的任务往往会吓倒人们;然而,在你选择的平台上实现一个简单的WebSocket服务器是很容易的。

WebSocket服务器可以用任何实现了Berkeley sockets的服务器端编程语言编写,如C(++)或Python甚至PHP服务器端JavaScript。 这不是任何特定语言的教程,而是作为指导,以方便编写自己的服务器。

您需要知道HTTP的工作原理,并具有中级编程经验。 根据语言帮助(Depending on language support),可能需要TCP套接字的知识。 本指南的范围是介绍编写WebSocket服务器所需的最低知识。

阅读最新的官方WebSockets规范, RFC 6455. 第1节和第4-7节对服务器实现者特别有意思。第10节讨论安全性,你应该在暴露你的服务器之前仔细阅读它。

WebSocket服务器在这里被解释得非常底层。 WebSocket服务器通常是独立的专用服务器(出于负载平衡或其他实际原因),因此您通常会使用反向代理(例如常规HTTP服务器)来检测WebSocket握手,预处理这些握手,并将这些客户端发送给 一个真正的WebSocket服务器。(例如)这意味着您不必使用cookie和身份验证处理程序来扩充服务器代码。

WebSocket 握手

首先,服务器必须使用标准的TCP套接字来监听传入的套接字连接。 根据您的平台,这可能已经为您处理。 例如,假设您的服务器正在监听example.com,端口8000,并且您的套接字服务器响应/chat上的GET请求。 .

警告:服务器可以监听它选择的任何端口,但是如果它选择了80或443以外的端口,防火墙和/或代理服务器可能会有问题。 端口443上的连接往往会更容易成功,但是当然,这需要一个安全的连接(TLS / SSL)。 另外请注意,大多数浏览器(特别是Firefox 8+)不允许从安全页面连接到不安全的WebSocket服务器。

握手是WebSockets中的“Web”。 这是从HTTP到WS的桥梁。 在握手过程中,协商的细节是协商的,如果条件不利,任何一方可以在完成之前退出。 服务器必须小心了解客户要求的一切,否则会引入安全问题。

客户端握手请求

即使您正在构建服务器,客户端仍然必须启动WebSocket握手过程。 所以你必须知道如何解释客户的请求。 客户端将发送一个相当标准的HTTP请求,看起来像这样(HTTP版本必须是1.1或更高,方法必须是GET):

GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

客户可以在这里请求扩展和/或子协议; see Miscellaneous for details. Also, common headers like User-Agent, RefererCookie, or authentication headers might be there as well. Do whatever you want with those; they don't directly pertain to the WebSocket. It's also safe to ignore them. In many common setups, a reverse proxy has already dealt with them.

如果任何头文件不被理解或者具有不正确的值,则服务器应该发送“400 Bad Request"”并立即关闭套接字。 像往常一样,它也可能会给出HTTP响应正文中握手失败的原因,但可能永远不会显示消息(浏览器不显示它)。 如果服务器不理解该版本的WebSocket,则应该发送一个Sec-WebSocket-Version头,其中包含它理解的版本。 (本指南解释了最新的v13)。 现在,我们来看看最奇怪的标题Sec-WebSocket-Key

Tip: 所有浏览器将会发送一个带有 Origin的头文件。 你可以 将这个头文件用于安全方面(检查是否是同一个域,白名单/ 黑名单等),如果你不喜欢你所看到的,你可以发送一个403 Forbidden。但是,需要注意的是非浏览器代理只能发送一个伪造的 Origin。大多数应用将会拒绝不带这个Origin 头文件的请求.

Tip: The request-uri (/chat here) has no defined meaning in the spec. So many people cleverly use it to let one server handle multiple WebSocket applications. For example, example.com/chat could invoke a multiuser chat app, while /game on the same server might invoke a multiplayer game.

Note: Regular HTTP status codes 只能在握手之前使用。 握手成功后,您必须使用一组不同的代码(在规范的第7.4节中定义)。

服务器握手响应

当它得到这个请求时,服务器应该发送一个非常奇怪的(但仍然是HTTP)响应,看起来像这样(记住每个头以 \r\n结尾,并在最后一个之后放置一个额外的 \r\n):

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

Additionally, the server can decide on extension/subprotocol requests here; see Miscellaneous for details. The Sec-WebSocket-Accept part is interesting. The server must derive it from the Sec-WebSocket-Key that the client sent. To get it, concatenate the client's Sec-WebSocket-Key and "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" together (it's a "magic string"), take the SHA-1 hash of the result, and return the base64 encoding of the hash.

FYI: This seemingly overcomplicated process exists so that it's obvious to the client whether or not the server supports WebSockets. This is important because security issues might arise if the server accepts a WebSockets connection but interprets the data as a HTTP request.

所以如果Key是“dGhlIHNhbXBsZSBub25jZQ==”,Accept将是“s3pPLMBiTxaQ9kYGzzhZRbK+xOo=”。 一旦服务器发送这些头,握手就完成了,你可以开始交换数据!

The server can send other headers like Set-Cookie, or ask for authentication or redirects via other status codes, before sending the reply handshake.

跟踪客户端

这并不直接与WebSocket协议相关,但是在这里值得一提的是:你的服务器将不得不跟踪客户的套接字,所以你不会再和已经完成握手的客户握手。 同一个客户端IP地址可以尝试连接多次(但是如果客户端尝试过多的连接,服务器可以拒绝它们以免遭Denial-of-Service attacks)。

交换数据帧

Either the client or the server can choose to send a message at any time — that's the magic of WebSockets. However, extracting information from these so-called "frames" of data is a not-so-magical experience. Although all frames follow the same specific format, data going from the client to the server is masked using XOR encryption (with a 32-bit key). Section 5 of the specification describes this in detail.

格式

每个数据帧(从客户端到服务器或反之亦然)遵循相同的格式:

Frame format:  
​​
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-------+-+-------------+-------------------------------+
     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
     |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
     |N|V|V|V|       |S|             |   (if payload len==126/127)   |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
     |     Extended payload length continued, if payload len == 127  |
     + - - - - - - - - - - - - - - - +-------------------------------+
     |                               |Masking-key, if MASK set to 1  |
     +-------------------------------+-------------------------------+
     | Masking-key (continued)       |          Payload Data         |
     +-------------------------------- - - - - - - - - - - - - - - - +
     :                     Payload Data continued ...                :
     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
     |                     Payload Data continued ...                |
     +---------------------------------------------------------------+

The MASK bit simply tells whether the message is encoded. Messages from the client must be masked, so your server should expect this to be 1. (In fact, section 5.1 of the spec says that your server must disconnect from a client if that client sends an unmasked message.) When sending a frame back to the client, do not mask it and do not set the mask bit. We'll explain masking later. Note: You have to mask messages even when using a secure socket.RSV1-3 can be ignored, they are for extensions.

The opcode field defines how to interpret the payload data: 0x0 for continuation, 0x1 for text (which is always encoded in UTF-8), 0x2 for binary, and other so-called "control codes" that will be discussed later. In this version of WebSockets, 0x3 to 0x7 and 0xB to 0xF have no meaning.

The FIN bit tells whether this is the last message in a series. If it's 0, then the server will keep listening for more parts of the message; otherwise, the server should consider the message delivered. More on this later.

解码有效载荷长度

To read the payload data, you must know when to stop reading. That's why the payload length is important to know. Unfortunately, this is somewhat complicated. To read it, follow these steps:

  1. Read bits 9-15 (inclusive) and interpret that as an unsigned integer. If it's 125 or less, then that's the length; you're done. If it's 126, go to step 2. If it's 127, go to step 3.
  2. Read the next 16 bits and interpret those as an unsigned integer. You're done.
  3. Read the next 64 bits and interpret those as an unsigned integer (The most significant bit MUST be 0). You're done.

读取和解密数据

If the MASK bit was set (and it should be, for client-to-server messages), read the next 4 octets (32 bits); this is the masking key. Once the payload length and masking key is decoded, you can go ahead and read that number of bytes from the socket. Let's call the data ENCODED, and the key MASK. To get DECODED, loop through the octets (bytes a.k.a. characters for text data) of ENCODED and XOR the octet with the (i modulo 4)th octet of MASK. In pseudo-code (that happens to be valid JavaScript):

var DECODED = "";
for (var i = 0; i < ENCODED.length; i++) {
    DECODED[i] = ENCODED[i] ^ MASK[i % 4];
}

Now you can figure out what DECODED means depending on your application.

消息帧

The FIN and opcode fields work together to send a message split up into separate frames.  This is called message fragmentation. Fragmentation is only available on opcodes 0x0 to 0x2.

Recall that the opcode tells what a frame is meant to do. If it's 0x1, the payload is text. If it's 0x2, the payload is binary data. However, if it's 0x0, the frame is a continuation frame. This means the server should concatenate the frame's payload to the last frame it received from that client. Here is a rough sketch, in which a server reacts to a client sending text messages. The first message is sent in a single frame, while the second message is sent across three frames. FIN and opcode details are shown only for the client:

Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg="year!"
Server: (process complete message) Happy new year to you too!

Notice the first frame contains an entire message (has FIN=1 and opcode!=0x0), so the server can process or respond as it sees fit. The second frame sent by the client has a text payload (opcode=0x1), but the entire message has not arrived yet (FIN=0). All remaining parts of that message are sent with continuation frames (opcode=0x0), and the final frame of the message is marked by FIN=1. Section 5.4 of the spec describes message fragmentation.

Pings和Pongs:WebSockets的心跳

在经过握手之后的任意时刻里,无论客户端还是服务端都可以选择发送一个ping给另一方。 当ping消息收到的时候,接受的一方必须尽快回复一个pong消息。 例如,可以使用这种方式来确保客户端还是连接状态。

一个ping消息或者一个pong消息都只是一个常规的帧, 是这个帧是一个控制帧。Ping消息的opcode字段值为 0x9,pong消息的opcode值为  0xA 。当你获取到一个ping消息的时候,回复一个跟ping消息有相同载荷数据的pong消息 (对于ping和pong,最大载荷长度位125)。 你也有可能在没有发送ping消息的情况下,获取一个pong消息,当这种情况发生的时候忽略它。

如果在你有机会发送一个pong消息之前,你已经获取了超过一个的ping消息,那么你只发送一个pong消息。

关闭连接

To close a connection either the client or server can send a control frame with data containing a specified control sequence to begin the closing handshake (detailed in Section 5.5.1). Upon receiving such a frame, the other peer sends a Close frame in response. The first peer then closes the connection. Any further data received after closing of connection is then discarded. 

杂项

WebSocket codes, extensions, subprotocols, etc. are registered at the IANA WebSocket Protocol Registry.

WebSocket extensions and subprotocols are negotiated via headers during the handshake. Sometimes extensions and subprotocols seem too similar to be different things, but there is a clear distinction. Extensions control the WebSocket frame and modify the payload, while subprotocols structure the WebSocket payload and never modify anything. Extensions are optional and generalized (like compression); subprotocols are mandatory and localized (like ones for chat and for MMORPG games).

扩展

This section needs expansion. Please edit if you are equipped to do so.

Think of an extension as compressing a file before e-mailing it to someone. Whatever you do, you're sending the same data in different forms. The recipient will eventually be able to get the same data as your local copy, but it is sent differently. That's what an extension does. WebSockets defines a protocol and a simple way to send data, but an extension such as compression could allow sending the same data but in a shorter format.

Extensions are explained in sections 5.8, 9, 11.3.2, and 11.4 of the spec.

TODO

子协议

Think of a subprotocol as a custom XML schema or doctype declaration. You're still using XML and its syntax, but you're additionally restricted by a structure you agreed on. WebSocket subprotocols are just like that. They do not introduce anything fancy, they just establish structure. Like a doctype or schema, both parties must agree on the subprotocol; unlike a doctype or schema, the subprotocol is implemented on the server and cannot be externally refered to by the client.

Subprotocols are explained in sections 1.9, 4.2, 11.3.4, and 11.5 of the spec.

A client has to ask for a specific subprotocol. To do so, it will send something like this as part of the original handshake:

GET /chat HTTP/1.1
...
Sec-WebSocket-Protocol: soap, wamp

or, equivalently:

...
Sec-WebSocket-Protocol: soap
Sec-WebSocket-Protocol: wamp

Now the server must pick one of the protocols that the client suggested and it supports. If there are more than one, send the first one the client sent. Imagine our server can use both soap and wamp. Then, in the response handshake, it'll send:

Sec-WebSocket-Protocol: soap

The server can't send more than one Sec-Websocket-Protocol header.
If the server doesn't want to use any subprotocol, it shouldn't send any Sec-WebSocket-Protocol header. Sending a blank header is incorrect.
The client may close the connection if it doesn't get the subprotocol it wants.

If you want your server to obey certain subprotocols, then naturally you'll need extra code on the server. Let's imagine we're using a subprotocol json. In this subprotocol, all data is passed as JSON. If the client solicits this protocol and the server wants to use it, the server will need to have a JSON parser. Practically speaking, this will be part of a library, but the server will need to pass the data around.

Tip: To avoid name conflict, it's recommended to make your subprotocol name part of a domain string. If you are building a custom chat app that uses a proprietary format exclusive to Example Inc., then you might use this: Sec-WebSocket-Protocol: chat.example.com. Note that this isn't required, it's just an optional convention, and you can use any string you wish.

关联

文档标签和贡献者

此页面的贡献者: zyt, yuyx91, yydzxz
最后编辑者: zyt,