什么是URL?

本文讨论了统一资源定位符(URL),并解释了他们是为什么,以及如何被构建的。

前提: 你首先需要知道how the Internet works, what a Web server is 以及 the concepts behind links on the web
目标: 你将会学习到 URL是什么,以及它在网络上是如何工作的 。

概述

Hypertext 以及 HTTP 一样,URL是Web中的一个核心概念。它是浏览器用来检索web上公布的任何资源的机制。

URL指的是统一资源定位符(Uniform Resource Locator。URL无非就是一个给定的独特资源在Web上的地址。理论上说,每个有效的URL都指向一个独特的资源。这个资源可以是一个HTML页面,一个CSS文档,一幅图像,等等。而在实际中,有一些例外,最常见的情况就是URL指向了不存在的或是被移动过的资源。 As the resource represented by the URL and the URL itself are handled by the Web server, it is up to the owner of the web server to carefully manage that resource and its associated URL.

自主学习

还没有可用的资料。Please, consider contributing.

深入探索

基础:剖析URL

下面是一些URL的示例:

https://developer.mozilla.org
https://developer.mozilla.org/en-US/docs/Learn/
https://developer.mozilla.org/en-US/search?q=URL

您可以将上面的这些网址输进您的浏览器地址栏来告诉浏览器加载相关联的页面(或资源)。

一个URL由不同的部分组成,其中一些是必须的,而另一些是可选的。让我们以下面这个URL为例看看其中最重要的部分:

http://www.example.com:80/path/to/myfile.html?key1=value1&key2=value2#SomewhereInTheDocument
Protocol
http:// 是协议。它指明了浏览器必须使用何种协议。它通常都是HTTP协议或是HTTP协议的加固版,即HTTPS。Web需要它们二者之一,但浏览器也知道如何处理其他协议,比如mailto:(打开邮件客户端)或者 ftp:(处理文件传输),所以当你看到这些协议时,不必惊讶。
Domaine Name
www.example.com 是域名。 It indicates which Web server is being requested. Alternatively, it is possible to directly use an IP address, but because it is less convenient, it is not often used on the Web.
Port
:80 是端口。 It indicates the technical "gate" used to access the resources on the web server. It is usually omitted if the web server use the standard ports of the HTTP protocol (80 for HTTP and 443 for HTTPS) to grant access to its resources. Otherwise it is mandatory.
Path to the file
/path/to/myfile.html 是网络服务器上资源的地址。 In the early days of the Web, a path like this represented a physical file location on the Web server. Nowadays, it is mostly an abstraction handled by Web servers without any physical reality.
Parameters
?key1=value1&key2=value2 是提供给网络服务器的额外参数。 Those parameters are a list of key/value pairs separated with the & symbol. The Web server can use those parameters to do extra stuff before returning the resource. Each Web server has its own rules regarding parameters, and the only reliable way to know if a specific Web server is handling parameters is by asking the Web server owner.
Anchor
#SomewhereInTheDocument is an anchor to another part of the resource itself. An anchor represents a sort of "bookmark" inside the resource, giving the browser the directions to show the content located at that "bookmarked" spot. On an HTML document, for example, the browser will scroll to the point where the anchor is defined; on a video or audio document, the browser will try to go to the time the anchor represents. It is worth noting that the part after the #, also known as fragment identifier, is never sent to the server with the request.

Note: There are some extra parts and some extra rules regarding URLs, but they are not relevant for regular users or Web developers. Don't worry about this, you don't need to know them to build and use fully functional URLs.

You might think of a URL like a regular postal mail address: the protocol represents the postal service you want to use, the domain name is the city or town,  and the port is like the zip code; the path represents the building where your mail should be delivered; the parameters represent extra information such as the number of the apartment in the building; and, finally, the anchor represents the actual person to whom you've addressed your mail.

如何使用URL

Any URL can be typed right inside the browser's address bar to get to the resource behind it. But this is only the tip of the iceberg!

The HTML language — which will be discussed later on — makes extensive use of URLs:

  • to create links to other documents with the <a> element;
  • to link a document with its related resources through various elements such as <link> or <script>;
  • to display medias such as images (with the <img> element), videos (with the <video> element), sounds and music (with the <audio> element), etc.;
  • to display other HTML documents with the <iframe> element.

Other technologies, such as CSS or JavaScript, use URLs extensively, and these are really the heart of the Web.

绝对URL和相对URL

What we saw above is called an absolute URL, but there is also something called a relative URL. Let's examine what that distinction means in more detail.

The required parts of a URL depend to a great extent on the context in which the URL is used. In your browser's address bar, a URL doesn't have any context, so you must provide a full (or absolute) URL, like the ones we saw above. You don't need to include the protocol (the browser uses HTTP by default) or the port (which is only required when the targeted Web server is using some unusual port), but all the other parts of the URL are necessary.

When a URL is used within a document, such as in an HTML page,  things are a bit different. Because the browser already has the document's own URL, it can use this information to fill in the missing parts of any URL available inside that document. We can differentiate between an absolute URL and a relative URL by looking only at the path part of the URL. If the path part of the URL starts with the "/" character, the browser will fetch that resource from the top root of the server, without reference to the context given by the current document.

Let's look at some examples to make this clearer.

绝对URL示例

Full URL (the same as the one we used before)
https://developer.mozilla.org/en-US/docs/Learn
Implicit protocol
//developer.mozilla.org/en-US/docs/Learn

In this case, the browser will call that URL with the same protocol as the the one used to load the document hosting that URL.

Implicit domain name
/en-US/docs/Learn

This is the most common use case for an absolute URL within an HTML document. The browser will use the same protocol and the same domain name as the one used to load the document hosting that URL. Note: it isn't possible to omit the domain name without omitting the protocol as well.

相对URL示例

To better understand the following examples, let's assume that the URLs are called from within the document located at the following URL: https://developer.mozilla.org/en-US/docs/Learn

Sub-resources
Skills/Infrastructure/Understanding_URLs
Because that URL does not start with /, the browser will attempt to find the document in a sub-directory of the one containing the current resource. So in this example,  we really want to reach this URL: https://developer.mozilla.org/en-US/docs/Learn/Skills/Infrastructure/Understanding_URLs
Going back in the directory tree
../CSS/display

In this case, we use the ../ writing convention — inherited from the UNIX file system world — to tell the browser we want to go up from one directory. Here we want to reach this URL: https://developer.mozilla.org/en-US/docs/Learn/../CSS/display, which can be simplified to: https://developer.mozilla.org/en-US/docs/CSS/display

Semantic URLs

Despite their very technical flavor, URLs represent a human-readable entry point for a Web site. They can be memorized, and anyone can enter them into a browser's address bar. People are at the core of the Web, and so it is considered best practice to build what is called semantic URLs.  Semantic URLs  use words with inherent meaning that can be understood by anyone, regardless of their technical know-how.

Linguistic semantics are of course irrelevant to computers. You've probably often seen URLs that look like mashups of random characters. But there are many advantages  to creating human-readable URLs:

  • It is easier for you to manipulate them.
  • It clarifies things for users in terms of where they are, what they're doing, what they're reading or interacting with on the Web.
  • Some search engines can use those semantics to improve the classification of the associated pages.

下一步

文档标签和贡献者

 此页面的贡献者: ziyunfei, wth
 最后编辑者: ziyunfei,