
Home |
Find museums |
Events & News |
Register your name |
Background info |
Contact
This document is a preparatory step toward the expansion of IDN support in .museum. It is intended to provide a basis for a table of characters that can safely be permitted in domain identifiers derived from the Yiddish language, and represented with its traditional extended Hebrew alphabet. An initial reference list is presented in the character table in Section 2. The implementation of any detail put forward during its discussion must be in full accordance with the general policies and terms stated at http://about.museum/idn/idnpolicy.html.
Subsequent to the initial posting of these two policy documents, the development of support for Yiddish in .museum was coordinated with the introduction of the same facility in the Swedish country-code domain, .SE. A detailed description of the joint action is given in http://about.museum/idn/museum-se-yiddish-35.pdf. Any descrepancies that may remain between the policies and procedures stated there, and the way they are indicated in the text immediately below, will be resolved in the final version of the present document, fully and clearly reflecting the more recent material.
The repertoire appearing in reference table was taken from The Standardized Yiddish Orthography, Rules of Yiddish Spelling, 6th ed., 1999, published by the YIVO Institute for Jewish Research in New York (ISBN 0-914512-25-0). This is the most frequently cited reference orthography for modern Standard Yiddish but it does not describe the full extent of orthographic variation encountered either in literary practice or common usage. In addition to the facets of this pre-exisiting range, the present exercise may be seen as the development of a specific IDN-safe variant. For reasons discussed in Section 4, this IDN-safe orthography may exclude several of the characters that appear in the current reference table. It may be noted, however, that all of the characters that are candidates for deletion are also regularly absent in other established contemporary Yiddish orthographic contexts.
In the following table a single code point is denoted as U+XXXX and a continuous range of code points indicated as XXXX..YYYY. Two code points appearing in succession as U+XXXX U+XXXX indicate combining characters that form a single displayed character. The first column in each row indicates one or more permitted code points. The second column illustrates the corresponding characters. (Their correct display requires the use of a font in which all are included.) The third column provides the Unicode names for the characters, with the names of individual components of a combining character separated by a "+" sign. The fourth column lists the Yiddish names for the characters. The fifth column refers to explanatory notes as numbered in Section 3.
All Unicode character names and code points are indicated below as they appear in the Unicode Hebrew Code Chart, listed in the normal sorting order of the Yiddish alphabet.
| Code Point | Symbol | Unicode Name | Yiddish Name | Notes |
|---|---|---|---|---|
| U+05D0 | א | HEBREW LETTER ALEF | shtumer alef | |
| U+05D0 U+05B7 | אַ | HEBREW LETTER ALEF + HEBREW POINT PATAH | pasekh alef | |
| U+05D0 U+05B8 | אָ | HEBREW LETTER ALEF + HEBREW POINT QAMATS | komets alef | |
| U+05D1 | ב | HEBREW LETTER BET | beys | |
| U+05D1 U+05BF | בֿ | HEBREW LETTER BET + HEBREW POINT RAFE | veys | 1 |
| U+05D2 | ג | HEBREW LETTER GIMEL | giml | |
| U+05D3 | ד | HEBREW LETTER DALET | daled | |
| U+05D4 | ה | HEBREW LETTER HE | hey | |
| U+05D5 | ו | HEBREW LETTER VAV | vov | |
| U+05D5 U+05BC | וּ | HEBREW LETTER VAV + HEBREW POINT DAGESH OR MAPIQ | melupm vov | 2 |
| U+05D6 | ז | HEBREW LETTER ZAYIN | zayen | |
| U+05D7 | ח | HEBREW LETTER HET | khes | |
| U+05D8 | ט | HEBREW LETTER TET | tes | |
| U+05D9 | י | HEBREW LETTER YOD | yud | |
| U+05D9 U+05B4 | יִ | HEBREW LETTER YOD + HEBREW POINT HIRIQ | khirik yud | 3 |
| U+05F2 U+05B7 | ײַ | HEBREW LIGATURE YIDDISH DOUBLE YOD + HEBREW POINT PATAH | pasekh tsvey yudn | |
| U+05DB U+05BC | כּ | HEBREW LETTER KAF + HEBREW POINT DAGESH OR MAPIQ | kof | 1 |
| U+05DB | כ | HEBREW LETTER KAF | khof | |
| U+05DA | ך | HEBREW LETTER FINAL KAF | langer khof | 4 |
| U+05DC | ל | HEBREW LETTER LAMED | lamed | |
| U+05DE | מ | HEBREW LETTER MEM | mem | |
| U+05DD | ם | HEBREW LETTER FINAL MEM | shlos mem | 4 |
| U+05E0 | נ | HEBREW LETTER NUN | nun | |
| U+05DF | ן | HEBREW LETTER FINAL NUN | langer nun | 4 |
| U+05E1 | ס | HEBREW LETTER SAMEKH | samekh | |
| U+05E2 | ע | HEBREW LETTER AYIN | ayen | |
| U+05E4 U+05BC | פּ | HEBREW LETTER PE + HEBREW POINT DAGESH OR MAPIQ | pey | |
| U+05E4 | פ | HEBREW LETTER PE | fey | 5 |
| U+05E4 U+05BF | פֿ | HEBREW LETTER PE + HEBREW POINT RAFE | fey | 5 |
| U+05E3 | ף | HEBREW LETTER FINAL PE | langer fey | 4 |
| U+05E6 | צ | HEBREW LETTER TSADI | tsadek | |
| U+05E5 | ץ | HEBREW LETTER FINAL TSADI | langer tsadek | 4 |
| U+05E7 | ק | HEBREW LETTER QOF | kuf | |
| U+05E8 | ר | HEBREW LETTER RESH | reysh | |
| U+05E9 | ש | HEBREW LETTER SHIN | shin | |
| U+05E9 U+05C2 | שׂ | HEBREW LETTER SHIN + HEBREW POINT SIN DOT | sin | 1 |
| U+05EA U+05BC | תּ | HEBREW LETTER TAV + HEBREW POINT DAGESH OR MAPIQ | tof | 1 |
| U+05EA | ת | HEBREW LETTER TAV | sof |
In addition to the characters and code points specified above, Yiddish names may include the following characters and code points from the Unicode Basic Latin Code Chart:
| U+002D | - | HYPHEN-MINUS |
| 0030..0039 | 0 - 9 | DIGIT ZERO - DIGIT NINE |
1 This character may be excluded from the production version of this table, as discussed in Section 4.6 below.
2 This character may only be used adjacent to a vov or preceding a yud. Also note the discussion in Section 4.5 below.
3 This character may only be used adjacent to a vowel. Also note the discussion in Section 4.5 below.
4 This character may only be used in the final position in a label or preceding a DIGIT or HYPHEN-MINUS.
5 The fey with rafe is included here because it is used in the YIVO reference. It is, however, redundant even in that context and is likely to be excluded from the production version of this table. The unpointed fey, which is a common alternative (although absent from the YIVO repertoire), has been added to this table and may be the sole option available in the production environment.
4.1 Each character in the table above that is represented as a base character with a combining point is also located as single precomposed character in the Unicode Alphabetic Presentation Forms Code Chart. The code points listed there are automatically converted to the ones in the table here by the IDN protocol. They can therefore not be included in a registered name, but can appear in a query string and will properly match the stored form.
4.2 There is a general problem with the use of combining characters in the final position in a label, making their appearance impossible. This cannot currently be overridden in the registration process, but rectification of the underlying technical difficulty is expected.
4.3 The Unicode code chart includes the digraphs tsvey vovn, vov yud, and tsvey yudn, as separate single-character ligatures. There are therefore two different ways in which each can be entered from a Yiddish keyboard. If there are separate single keys for the digraphs, it is likely that they will generate the ligatures as U+05F0, U+05F1, and U+05F2. Whether or not that option is available — and it is frequently absent — some users may enter them as two-key combinations giving U+05D5 U+05D5, U+05D5 U+05D9, and U+05D9 U+05D9. The IDN protocol does not normalize the one form to the other, and the table here therefore only supports the latter alternative. The use of ligatures has been restricted to the single case of the pasekh tsvey yudn (U+05F2 U+05B7), which can not be represented in any other way.
4.4 Some applications incorrectly display a blank space adjacent to a Hebrew character with a combining mark. The combining mark may be displayed in that space rather than properly together with the base character. Some combining marks and fonts are more prone to this than others, and the way they are handled also varies between applications and operating systems. The only completely reliable way to avoid this risk is to refrain from the use of combining marks, altogether. It is therefore recommended that .museum names in Hebrew script use unmarked characters wherever they are orthographically acceptable, even if they are only marginally so. Explicit policy constraint may be applied to this, and some combinations presently appearing in the reference table may be excluded from the production version, in addition to the restriction already noted in Section 3, point 5.
4.5 The dot marking can be eliminated either simply by refraining from its use, or by using a shtumer alef to disambiguate the consonantal use of the base character from the vocalic — װאונדער rather than װוּנדער, and פּרואװן rather than פּרוּװן. Although the termination of the latter device was one of the objectives of the YIVO reform, it is nonetheless commonly encountered in contemporary practice, as is the use of unmarked vovn and yudn as both consonants and vowels.
4.6 Several of the characters in the reference table only appear in words of Hebrew and Aramaic origin. When such words appear in Yiddish text they are normally written using the shared native orthography of the parent languages, which would not include any form of marking in a context such as IDN. The YIVO orthography marks several of these characters, nonetheless. They therefore appear in the current reference table, in which they also appear in unmarked form, but are candidates for exclusion from the production version.
4.7 Potential need is recognized for the HEBREW PUNCTUATION GERESH (U+05F3) and the HEBREW PUNCTUATION GERSHAYIM (U+05F4). These characters are, however, not directly available on many standard Hebrew keyboards and are commonly replaced by the APOSTROPHE (U+0027) and the QUOTATION MARK (U+0022). This substitution is not permissible in a domain name. Although the correct Unicode characters can be included in an IDN label, a keyboarded transcription of that label is likely to fail without the reason being apparent to the non-specialist user. Pending the determination of compelling need for the GERESH and GERSHAYIM despite this intricacy (and possibly not even then), these characters have not been included in the reference table.
Latest update: 2008-09-03