Home | Find museums | Events & News | Register your name | Background info | Contact
Internationalized Domain Names (IDNs) were made possible by the development of a standardized way for encoding names containing the characters required for a large number of the world's written languages, using nothing more than the twenty-six letters of the Latin alphabet (a-z), the ten digits (0-9), and the hyphen (-). The Domain Name System (DNS) was originally devised to handle names that were limited to these 37 characters (plus the upper case A-Z) and it can now serve unaltered for names using the far larger IDN character repertoire.
Every IDN has a displayed form that presents all of its characters on a computer screen as a user would expect to see them. The corresponding encoded form is not directly intelligible to users and is only intended to be visible in limited situations where IDN-aware software is not available. Unfortunately, many commonplace application programs do not yet have support for IDN. A basic understanding of the differences between the displayed and encoded forms of an IDN is therefore useful for anyone who actively wishes to use domain names containing characters from the extended range.
Before proceeding with the discussion of the two forms, readers may wish to test the IDN capacities of their own Web browsers. The ability to deal properly with IDN can easily be determined with any IDN URL, for example:
If the browser is IDN-aware, this will lead to a proper Web page (in this case, one providing Swedish information about registering names in .museum) and will otherwise result in an error message. There is also an intermediate state where the connection to the target site is correctly established but its identifier appears in an unexpected manner in the browser's address line, displaying the encoded rather than the legible form of the IDN. The purpose of the present text is to suggest means for structuring a resource presented in an IDN domain in a manner that maximizes its accessibility from the full range of browsing environments. The reader whose only interest is in being able to visit IDN sites will not need to proceed beyond the instructions about how to acquire the requisite software.
Users of Internet Explorer will need either the i-Nav or the echIDNA plug-in module. The first adds full transparent support for IDN, and the second optionally also displays the encoded and legible forms of an IDN in parallel. Recent versions of several other browsers have integral IDN support (which is also anticipated in Internet Explorer 7, currently in its first beta release, with IDN support to be introduced in the second beta release). These include Netscape and Opera. The current releases of Mozilla and Firefox also support IDN but impose the constraint on its correct display referred to in the preceding paragraph. (They can be reconfigured to display the expected characters by entering 'about:config' in the address line, scrolling to 'network.IDN_show_punycode', and changing its value to 'false'.) The way in which Safari displays IDN is determined by a customizable list of supported scripts. (Opera restricts IDN display on a domain-by-domain basis but fully supports .museum, which is similarly exempt from further restrictions in Mozilla and Firefox that are not described here.) These restrictions were introduced in immediate response to a security concern that is discussed in a separate IDN issues list, and their further revision in a manner that restores the proper display of IDN is expected. (The reader should note that this development is in active progress and its present state may not always be correctly described in the material provided here.)
The differences between the two IDN forms will be illustrated with the hypothetical skånska.lättöls.museum, displayed here using characters taken from the Unicode code charts. The corresponding encoded form (which is the one that would actually be registered and used in the Domain Name System) is xn--sknska-jua.xn--lttls-gra2k.museum. This is referred to alternatively as the ACE (ASCII Compatible Encoding) or Punycode form of the name. A Punycode label is invariably prefixed with xn-- and the actual encoded sequence follows thereafter. There are two IDN labels in the example used here, each of which is encoded as a separate xn-- sequence.
There are conversion services that may be used to determine the encoded form of an IDN, and for the corresponding decoding. The holders of names in .museum who require this information should note that it will be provided to them during the registration process without any need for conducting the technical exercise described here.
The obvious way to indicate an IDN in an HTML source document might appear to be:
This will probably function as intended if the reader of that document is using an IDN-aware browser but, in fact, is not a legal form for a URL. In order to ensure that a URL will both lead to the desired target in any browsing environment and conform to all technical specifications, the encoded representation of the name can be used as the target:
Alternatively, if the target Web site is also designated by a name that does not include IDN characters, the latter form may be used in the URL as:
A user seeing this URL but otherwise knowing nothing about IDN would be likely to 'flatten' the name by removing the ring and umlauts from skånska.lättöls.museum, and would certainly do so if using the IDN form resulted in an error message. The holder of any IDN that may be seen as a sequence of Latin letters with added markings should therefore consider acquiring both forms of the name, and using them in one of the alternate manners described here.
It might seem as though writing a URL in this manner does not even require the IDN form of the name to have been registered. Although the clickable link will function as presented here, if the displayed text is entered via a keyboard or cut and pasted, the intended result will only be achieved if the IDN has been correctly registered in the DNS.
Until such time as IDN-aware software is in general use, a Web page containing IDN hyperlinks should probably contain a brief note about software requirements and the potential need for installing upgrades or add-ons. This is common practice with a variety of proprietary file formats and is frequently necessary when new facilities are introduced.
Displayed Unicode characters can be represented in different ways in an HTML source document. They can be stated literally (ä as ä) or be indicated as 'character entity references' (ä as ä). To avoid any risk for confusion with specific regard to IDN, it is better still to use 'numeric character references' (ä as ä). Every IDN character is uniquely identified by its numerical position in the Unicode charts. These are published as hexadecimal values (that is, numbers using sixteen different digits, represented with the 10 decimal digits plus the letters A through F) and can be included directly in numeric character references in HTML source. The numerical 'code points' permitted in .museum names are listed at http://about.museum/idn/language.html. In IDN documentation, a Unicode character with the hexadecimal value E4 is indicated as U+00E4. In HTML this is changed to ä. When written with numeric character references skånska.lättöls.museum appears as skånska.lättöls.museum. Indicating characters in the latter form eliminates need for concern with a browser correctly noting that Unicode, and not one of the alternate encoding schemes that are still in widespread use, is intended.
Using the numerical character references in the display portion of a URL and Punycode in the target makes it possible to embed support for IDN directly in an HTML document without requiring any facility in the reader's software environment beyond the ability to display the indicated Unicode characters. In fact, a hyperlink that is presented in this manner will function correctly even if a font containing all the necessary display characters is not available to the browser.
If included in an HTML source document in this browser-insensitive detail, the example used above would appear as:
The positive aspect of this is that the display text in the hyperlink will be exactly as intended - http://skånska.lättöls.museum. The disadvantage is that the encoded form will be revealed to anyone clicking on this link. In light of recent security concerns about the deliberate exploitation of visually confusable characters, signaling the presence of an IDN in this manner can also be seen as a useful feature. The result, in any case, is identical to the one attained by the use of a browser with the restricted IDN support described in Section 2 above.
If users are expected to have IDN-aware browsers, numeric character references may be indicated in both the target and display positions of the URL. Repeating the previous example in that form gives:
The next significant question is how the target Web site should identify itself. The appearance of an IDN in the address line by injection from the server could easily cause difficulty if an attempt were then made to reload the page and the browser was not IDN-aware. The echoing of the encoded representation would avoid that problem but would hardly be a meaningful thing to do otherwise. One safe alternative would be simply to permit the Web site to identify itself precisely as it did prior to the availability of IDN. A further alternative would be for the Web site to identity itself using an intuitively close non-IDN equivalent to the full IDN representation, as illustrated above.
The ultimate purpose of IDN is, of course, to enable Web sites to identity themselves using localized representations. Configuring a Web site to identify itself by echoing the form in which it was called supports all of the devices described here. It should also be noted that any browser will be able to deal correctly with the Punycode form of an IDN being entered directly into its address line. It is, however, extremely unlikely that a user would attempt to access a Web site in this manner unless there were no other alternative, nor is Punycode intended for such use. In the most common situation, a name will be typed into an address line using Unicode characters. However, keyboards normally only support the languages of the locale in which they are used. Names containing unfamiliar characters will therefore frequently be cut and pasted into address lines rather than typed directly.
The application of IDN to e-mail involves the same basic considerations. No special software is needed for the use of e-mail addresses containing Punycode representations of IDNs. An e-mail address presented as a URL can also use a Unicode display, for example, as:
The Punycode form will also be revealed to anyone clicking on a mailto: link indicated in this manner. If the resulting display of Punycode in the respondent's e-mail composer is unacceptable, the alternate form used with the http URL can be applied here as well, giving:
An IDN e-mail address inserted into in the 'From:' or 'Reply-to:' lines of the return message can only be dealt with correctly by IDN-aware e-mail software. A reply would otherwise only be possible if the Punycode form of the address had also been communicated and was manually entered into the outgoing address line. The latter procedure was originally expected to have no practical value but may prove more common in actual practice than first anticipated. It must also be noted that IDN only applies to the domain name portion of an e-mail address, to the right of the '@', thus placing a significant constraint on the utility of IDN for e-mail addressing.
The normative basis for IDN support in e-mail is still being developed but some programs are able to process IDN in the headers of e-mail messages. Significant issues remain about the way both e-mail addresses and URLs containing IDNs should be treated when they appear in the bodies of e-mail messages. No general procedures can be suggested here beyond noting that an e-mail program packaged with an IDN-aware browser may provide support for IDN in e-mail headers.
There are many ways in which the potential of IDN can be harnessed. The main mode of their expression is, however, likely to remain by inclusion in hyperlinks. The ubiquitous availability of IDN-aware browsers will eliminate the need for crafting URLs with the artifice described above, and will broaden the range of languages that can be used with equal ease, but will do little to surmount the problems inherent in the language specificity of keyboards. Despite these limitations, there are situations in which IDN can already be used to good advantage in truly multilingual contexts with no requirement for IDN-aware software. A useful illustration of this is provided by the International Council of Museums (ICOM) at http://icom.museum/idn/.
Latest update: 2005-12-22