Internationalized Domain Names (IDN) Support in Mozilla Browsers
出典: MDC
目次 |
[編集] 紹介
Netscape 7.1は、2003年に制定された新しいIETF RFCの国際化ドメイン名を組み込みサポートした最初の商用ブラウザです。
国際化ドメイン名は、非ASCII文字を使ったドメイン名/ホスト名です。ごく最近までは、ドメイン名には 7ビットASCII文字のサブセットしか使えませんでした。インターネットが世界中の非英語圏の人々にも広がったように、ドメイン名にラテンアルファベットの一部だけを使うことを強いることが理想的でないことははますます明らかになりました。
ヨーロッパの言語の多くは、基礎的なラテン文字にアクセント文字を追加して記述します。しかし、ドメイン名の中でそれらを使用することができませんでした。記述文字がラテン文字とは全く違う多くの言語があります。これらの言語を話す人は、彼らの母国語の中でおなじみの名をインターネットドメイン/ホスト名の一部として使うことができませんでした。
ここ数年の間、ドメイン名で非ASCII文字を扱うためのプロトコルを標準化する活動が突然IETFで起こりました。2003年3月に、3つの重要なRFCがIETFによって承認されました。 (RFC 3490, 3491, 3492参照) これらの新しいRFCは、ドメインネーム・サーバが非ASCIIドメイン名を登録すること、および、アプリケーション/クライアントベンダが標準化でサポートされた非ASCII文字をドメイン名に実装することを今可能にします。
[編集] How IDN Works
ブラウザが http://developer.mozilla.org などのホスト名を見るとき、リクエストは DNSリゾルバサービス (通常、OSに内蔵されている) を経由し、そこから折り返し最も近いドメインネームサーバにリクエストされ、該当するホスト名のIPアドレスが返されます。そして、このIPアドレスは、問い合わされたWebサーバに接続するために使われます。
IDNは、非ASCII文字によるホスト/ドメイン名をユーザがブラウザのロケーションバーに入力したり、Webページに埋め込まれたURLに使うことを可能にします。ネットワークプロトコルレベルでは、URL/URIにASCII文字のサブセットを使うという制限に一切の変更はありません。もし、利用者がドメイン名の一部として非ASCII文字を入力したり、Webページに非ASCIIドメイン名を使ったリンクが含まれていたら、アプリケーションが普通のASCII文字サブセット文字だけの特殊な符号化フォーマットに変換しなければなりません。 RFC 3490 (Internationalizing Domain Names in Applications (IDNA)) では、IDNで使われる文字を Unicode Standard 3.2から作ると規定しています。また、それはアプリケーションが非ASCII文字を既存のホスト名文字の制限に従わせる方法を規定します。
[編集] Mozilla ブラウザが非ASCIIドメイン名をどのように扱うか
[編集] Unicode と Nameprep
When Mozilla receives IDN input from the user via the location bar or a request to process non-ASCII host name links, it first turns them into Unicode, then normalizes the input string to make it conform to general URI requirement.
The process will convert uppercase characters to lowercase ones (Case folding), unify characters with multiple representation, e.g. conversion of Half-width Kana characters in Japanese into Full-width ones (normalization), eliminate prohibited characters (e.g. space), eliminate ambiguities in bi-directional text (e.g. Arabic and Hebrew), and check whether or not unassigned characters in the Unicode repertoire are used -- allowing them for "query strings" but disallowing them for "stored strings" such as the data input for domain name registration.
This process is called "Nameprep" and is performed according to RFC 3491 (Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)) and RFC 3454 (Preparation of Internationalized Strings ("stringprep")).
[編集] ASCII互換エンコード(ASCII-compatible encoding) (ACE)
The next step is to convert the 8-bit characters in Unicode to 7-bit ones using only restricted ASCII characters. During the discussion phase of the IDN protocols development, there were some competing ASCII-compatible encoding (ACE) schemes proposed but an agreement was reached eventually to standardize on a type of ACE called "Punycode". This is defined in RFC 3492 (Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)).
The Punycode proposal uses only restricted ASCII characters and numbers (a-z0-9) and a hyphen (-). It was shown to be language independent, superior in compression, compact in code size, round-trip safe, and superior for encoding Chinese/Japanese/Korean characters.
The final step of the process is the affixing of the ACE prefix to the output string from the Nameprep/stringprep and Punycode processing. Since the Punycode contains only ASCII characters, it is possible that an output may, though unlikely, coincide with existing domain names. To avoid such a complication, RFC 3490 defines a special prefix "xn--" for the ACE (Punycode) output. Other encodings used different prefixes. e.g. "bq--" for RACE, but all except the standard ACE prefix "xn--" are now disallowed in IDN.
As an example, an output string to be sent to a DNS server for a Japanese domain name, "http://ジェーピーニック.jp", will look like the following in ACE form:
http://xn--hckqz9bzb1cyrb.jp
[編集] Domain Name Registration
After the technical standards were established by IETF, the last remaining issue was for domain name registrars to agree on an international guideline on the use of IDN characters. This was accomplished by the publication of the ICANN guideline for IDN in June of 2003. (Cf. ICANN = Internet Corporation for Assigned Names and Numbers.) The guideline allows domain name registrars in each country to limit the use of characters for domain names. Since the Unicode repertoire contains characters no longer used in any living languages and there are also living characters in most languages that are not suitable for URI/URL creation, the ICANN guideline allows the governing body of each country's domain registrars to set appropriate limitations on the use of characters.
With this last piece of obstacle for standardization out of the way, domain name registrars are expected to move forward on implementing the new RFC's for existing and future IDN registrations quickly.
JPRS (Japan Registry Service) decided to move to the new RFC implementation on July 10, 2003, only a few weeks after the ICANN's guideline was published. This makes it possible for Netscape 7.1/Mozilla 1.4 users to access Japanese host names under .jp top domain without any additional setup using just the built-in IDN functionality.
[編集] Real World Examples
[編集] Punycode
There are real world examples of IDN that you can test with Netscape 7.1, which uses Punycode as the default IDN encoding. For example, Most sample links on the following test pages can be used without any further setting:
- http://www.nunames.nu/eu-lang-test.htm (Domain names with Latin 1 accented characters)
- http://www.nunames.nu/lldemo/default.htm (Domain names in other languages)
On July 10, 2003 and thereafter, you can access a large number of Japanese domain name sites under the .jp top domain with no further setting on Netscape 7.1/Mozilla 1.4:
[編集] RACE (Row-based ASCII Compatible Encoding)
Almost all IDN registration data are expected to change to Punycode by the end of 2003. Some country will complete the conversion quickly, e.g. Japan as mentioned above, but others such as the ones under the .com and .net top domains may take longer.
Most of the existing sites currently use the ASCII-compatible encoding known as RACE or Row-based ASCII Compatible Encoding, which was not accepted as a standard by IETF. If you find IDN test sites under the .com and .net top domains, and if you cannot successfully access these sites, you can use the following workaround until the conversion to Punycode is completed for these top domains:
Using Netscape 7.1 or Mozilla 1.4:
- Type about:config into the location/URL bar. This will list all the preferences for your current profile. These preferences can be modified or new ones can be created without quitting the browser using the steps described below.
- Create a new preference item using the menu New > String via a right-mouse click. The name of the preference is: network.IDN_prefix. The value should be "bq--". This will change the default from Puncycode to RACE.
- Next create another new preference item using the right-mouse click menu New > Boolean. The name of the preference is: network.IDN_testbed. The value should be "true".
- Now access IDN sites under the .com and .net top domains. You should succeed in reaching the sample sites.
- Don't forget to set the value of these preferences to "default" once you are finished with testing!
[編集] Caveats and Conclusions
Netscape 7.1/Mozilla 1.4 has solid support for Internationalized Domain Names and is the first browser with built-in support for new RFC's for IDN established by IETF. This means that there is no longer any need to use a plug-in to process non-ASCII domain names.
Netscape/Mozilla's support for IDN is not without some bugs. One notable bug is that non-ASCII names are not always displayed correctly in some UI areas such as Preference panels, Bookmarks and History. Non-ASCII names are not always correctly displayed in the location bar due to the fact that ACE to Unicode conversion is not implemented yet. Of particular concern for Japanese users is the one in which Full-width Japanese Roman characters are not normalized to ASCII roman characters. (Cf. bug 210734.) This forces the Japanese user to shift out of the Japanese input mode to write the top domain names such as .jp causing inconvenience. For other bugs, see this link.
IDN is a global trend and is likely to be adopted by a large number of sites making it easier for average Internet users to find web sites. Many web sites around the world are expected to register native language host names with the appropriate domain name registrars for their top domains. Netscape 7.1 and Mozilla 1.4 are playing a significant role in aiding the development of IDN further.