7 Internet-Draft Kurt D. Zeilenga
8 Intended Category: Standard Track OpenLDAP Foundation
9 Expires in six months 27 October 2003
13 LDAP: Internationalized String Preparation
14 <draft-ietf-ldapbis-strprep-02.txt>
19 This document is an Internet-Draft and is in full conformance with all
20 provisions of Section 10 of RFC2026.
22 Distribution of this memo is unlimited. Technical discussion of this
23 document will take place on the IETF LDAP Revision Working Group
24 mailing list <ietf-ldapbis@openldap.org>. Please send editorial
25 comments directly to the author <Kurt@OpenLDAP.org>.
27 Internet-Drafts are working documents of the Internet Engineering Task
28 Force (IETF), its areas, and its working groups. Note that other
29 groups may also distribute working documents as Internet-Drafts.
30 Internet-Drafts are draft documents valid for a maximum of six months
31 and may be updated, replaced, or obsoleted by other documents at any
32 time. It is inappropriate to use Internet-Drafts as reference
33 material or to cite them other than as ``work in progress.''
35 The list of current Internet-Drafts can be accessed at
36 <http://www.ietf.org/ietf/1id-abstracts.txt>. The list of
37 Internet-Draft Shadow Directories can be accessed at
38 <http://www.ietf.org/shadow.html>.
40 Copyright (C) The Internet Society (2003). All Rights Reserved.
42 Please see the Full Copyright section near the end of this document
48 The previous Lightweight Directory Access Protocol (LDAP) technical
49 specifications did not precisely define how character string matching
50 is to be performed. This lead to a number of usability and
51 interoperability problems. This document defines string preparation
52 algorithms for character-based matching rules defined for use in LDAP.
58 Zeilenga LDAPprep [Page 1]
60 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
67 document are to be interpreted as described in BCP 14 [RFC2119].
69 Character names in this document use the notation for code points and
70 names from the Unicode Standard [Unicode]. For example, the letter
71 "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
72 In the lists of mappings and the prohibited characters, the "U+" is
73 left off to make the lists easier to read. The comments for character
74 ranges are shown in square brackets (such as "[CONTROL CHARACTERS]")
75 and do not come from the standard.
77 Note: a glossary of terms used in Unicode can be found in [Glossary].
78 Information on the Unicode character encoding model can be found in
86 A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule
87 [Syntaxes] defines an algorithm for determining whether a presented
88 value matches an attribute value in accordance with the criteria
89 defined for the rule. The proposition may be evaluated to True,
92 True - the attribute contains a matching value,
94 False - the attribute contains no matching value,
96 Undefined - it cannot be determined whether the attribute contains
97 a matching value or not.
99 For instance, the caseIgnoreMatch matching rule may be used to compare
100 whether the commonName attribute contains a particular value without
101 regard for case and insignificant spaces.
104 1.2. X.500 String Matching Rules
106 "X.520: Selected attribute types" [X.520] provides (amongst other
107 things) value syntaxes and matching rules for comparing values
108 commonly used in the Directory. These specifications are inadequate
109 for strings composed of characters from the Universal Character Set
110 (UCS) [ISO10646], a superset of Unicode [Unicode].
114 Zeilenga LDAPprep [Page 2]
116 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
119 The caseIgnoreMatch matching rule [X.520], for example, is simply
120 defined as being a case insensitive comparison where insignificant
121 spaces are ignored. For printableString, there is only one space
122 character and case mapping is bijective, hence this definition is
123 sufficient. However, for UCS-based string types such as
124 universalString, this is not sufficient. For example, a case
125 insensitive matching implementation which folded lower case characters
126 to upper case would yield different different results than an
127 implementation which used upper case to lower case folding. Or one
128 implementation may view space as referring to only SPACE (U+0020), a
129 second implementation may view any character with the space separator
130 (Zs) property as a space, and another implementation may view any
131 character with the whitespace (WS) category as a space.
133 The lack of precise specification for character string matching has
134 led to significant interoperability problems. When used in
135 certificate chain validation, security vulnerabilities can arise. To
136 address these problems, this document defines precise algorithms for
137 preparing character strings for matching.
140 1.3. Relationship to "stringprep"
142 The character string preparation algorithms described in this document
143 are based upon the "stringprep" approach [StringPrep]. In
144 "stringprep", presented and stored values are first prepared for
145 comparison and so that a character-by-character comparison yields the
148 The approach used here is a refinement of the "stringprep"
149 [StringPrep] approach. Each algorithm involves two additional
152 a) prior to applying the Unicode string preparation steps outlined in
153 "stringprep", the string is transcoded to Unicode;
155 b) after applying the Unicode string preparation steps outlined in
156 "stringprep", characters insignificant to the matching rules are
159 Hence, preparation of character strings for X.500 matching involves
166 5) Check Bidi (Bidirectional)
170 Zeilenga LDAPprep [Page 3]
172 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
175 6) Insignificant Character Removal
177 These steps are described in Section 2.
180 1.4. Relationship to the LDAP Technical Specification
182 This document is a integral part of the LDAP technical specification
183 [Roadmap] which obsoletes the previously defined LDAP technical
184 specification [RFC3377] in its entirety.
186 This document details new LDAP internationalized character string
187 preparation algorithms used by [Syntaxes] and possible other technical
188 specifications defining LDAP syntaxes and/or matching rules.
191 1.5. Relationship to X.500
193 LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism.
194 As such, there is a strong desire for alignment between LDAP and X.500
195 syntax and semantics. The character string preparation algorithms
196 described in this document are based upon "Internationalized String
197 Matching Rules for X.500" [XMATCH] proposal to ITU/ISO Joint Study
201 2. String Preparation
203 The following six-step process SHALL be applied to each presented and
204 attribute value in preparation for character string matching rule
212 6) Insignificant Character Removal
214 Failure in any step causes the assertion to evaluate to Undefined.
216 This process is intended to act upon non-empty character strings. If
217 the string to prepare is empty, this process is not applied and the
218 assertion is evaluated to Undefined.
220 The character repertoire of this process is Unicode 3.2 [Unicode].
226 Zeilenga LDAPprep [Page 4]
228 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
233 Each non-Unicode string value is transcoded to Unicode.
235 TeletexString [X.680][T.61] values are transcoded to Unicode as
236 described in Appendix A.
238 PrintableString [X.680] value are transcoded directly to Unicode.
240 UniversalString, UTF8String, and bmpString [X.680] values need not be
241 transcoded as they are Unicode-based strings (in the case of
242 bmpString, a subset of Unicode).
244 The output is the transcoded string.
249 SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
250 points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and
251 VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also
252 mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
255 CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
256 TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
257 (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
259 All other control code points (e.g., Cc) or code points with a control
260 function (e.g., Cf) are mapped to nothing.
262 ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code points
263 with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or
264 Zp) are mapped to SPACE (U+0020).
266 For case ignore, numeric, and stored prefix string matching rules,
267 characters are case folded per B.2 of [StringPrep].
269 The output is the mapped string.
274 The input string is be normalized to Unicode Form KC (compatibility
275 composed) as described in [UAX15]. The output is the normalized
282 Zeilenga LDAPprep [Page 5]
284 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
289 All Unassigned code points are prohibited. Unassigned code points are
290 listed in Table A.1 of [StringPrep].
292 Private Use (U+E000-F8FF, F0000-FFFFD, 100000-10FFFD) code points are
295 All non-character code points (U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF,
296 2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF,
297 7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF,
298 CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, 10FFFE-10FFFF) are
301 Surrogate codes (U+D800-DFFFF) are prohibited.
303 The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
305 The first code point of a string is prohibited from being a combining
308 The step fails if the input string contains any prohibited code point.
309 The output is the input string.
314 There are no bidirectional restrictions. The output is the input
318 2.5. Insignificant Character Removal
320 In this step, characters insignificant to the matching rule are to be
321 removed. The characters to be removed differ from matching rule to
324 Section 2.5.1 applies to case ignore and exact string matching.
325 Section 2.5.2 applies to numericString matching.
326 Section 2.5.3 applies to telephoneNumber matching
329 2.5.1. Insignificant Space Removal
331 For the purposes of this section, a space is defined to be the SPACE
332 (U+0020) code point followed by no combining marks.
334 NOTE - The previous steps ensure that the string cannot contain any
338 Zeilenga LDAPprep [Page 6]
340 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
343 code points in the separator class, other than SPACE (U+0020).
345 If the input string consists entirely of spaces or is empty, the
346 output is a string consisting of exactly one space (e.g. " ").
348 Otherwise, the following spaces are removed:
349 - leading spaces (i.e. those preceding the first character that is
351 - trailing spaces (i.e. those following the last character that is
353 - multiple consecutive spaces (these are taken as equivalent to a
354 single space character).
356 For example, removal of spaces from the Form KC string:
357 "<SPACE><SPACE>foo<SPACE><SPACE>bar<SPACE><SPACE>"
358 would result in the output string:
360 and the Form KC string:
361 "<SPACE><SPACE><SPACE>"
362 would result in the output string:
366 2.5.2. numericString Insignificant Character Removal
368 For the purposes of this section, a space is defined to be the SPACE
369 (U+0020) code point followed by no combining marks.
371 All spaces are regarded as not significant. If the input string
372 consists entirely of spaces or is empty, the output is a string
373 consisting of exactly one space (e.g. " "). Otherwise, all spaces are
376 For example, removal of spaces from the Form KC string:
377 "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>"
378 would result in the output string:
380 and the Form KC string:
381 "<SPACE><SPACE><SPACE>"
382 would result in the output string:
386 2.5.3. telephoneNumber Insignificant Character Removal
388 For the purposes of this section, a hyphen is defined to be
389 HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
390 NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
394 Zeilenga LDAPprep [Page 7]
396 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
399 (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no
400 combining marks and a space is defined to be the SPACE (U+0020) code
401 point followed by no combining marks.
403 All hyphens and spaces are considered insignificant. If the string
404 contains only spaces and hyphens or is empty, then the output is a
405 string consisting of one space. Otherwise, all hyphens and spaces are
408 For example, removal of hyphens and spaces from the Form KC string:
409 "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>"
410 would result in the output string:
412 and the Form KC string:
413 "<HYPHEN><HYPHEN><HYPHEN>"
414 would result in the output string:
418 3. Security Considerations
420 "Preparation for International Strings ('stringprep')" [StringPrep]
421 security considerations generally apply to the algorithms described
427 Appendix A and B of this document were authored by Howard Chu
428 <hyc@symas.com> of Symas Corporation (based upon information provided
434 The approach used in this document is based upon design principles and
435 algorithms described in "Preparation of Internationalized Strings
436 ('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet. Some
437 additional guidance was drawn from Unicode Technical Standards,
438 Technical Reports, and Notes.
440 This document is a product of the IETF LDAP Revision (LDAPBIS) Working
450 Zeilenga LDAPprep [Page 8]
452 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
455 E-mail: <kurt@openldap.org>
460 7.1. Normative References
462 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
463 Requirement Levels", BCP 14 (also RFC 2119), March 1997.
465 [Roadmap] Zeilenga, K. (editor), "LDAP: Technical Specification
466 Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in
469 [StringPrep] Hoffman P. and M. Blanchet, "Preparation of
470 Internationalized Strings ('stringprep')",
471 draft-hoffman-rfc3454bis-xx.txt, a work in progress.
473 [Syntaxes] Legg, S. (editor), "LDAP: Syntaxes and Matching Rules",
474 draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress.
476 [ISO10646] International Organization for Standardization,
477 "Universal Multiple-Octet Coded Character Set (UCS) -
478 Architecture and Basic Multilingual Plane", ISO/IEC
481 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
482 3.2.0" is defined by "The Unicode Standard, Version 3.0"
483 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
484 as amended by the "Unicode Standard Annex #27: Unicode
485 3.1" (http://www.unicode.org/reports/tr27/) and by the
486 "Unicode Standard Annex #28: Unicode 3.2"
487 (http://www.unicode.org/reports/tr28/).
489 [UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15:
490 Unicode Normalization Forms, Version 3.2.0".
491 <http://www.unicode.org/unicode/reports/tr15/tr15-22.html>,
494 [X.680] International Telecommunication Union -
495 Telecommunication Standardization Sector, "Abstract
496 Syntax Notation One (ASN.1) - Specification of Basic
497 Notation", X.680(1997) (also ISO/IEC 8824-1:1998).
499 [T.61] CCITT (now ITU), "Character Repertoire and Coded
500 Character Sets for the International Teletex Service",
506 Zeilenga LDAPprep [Page 9]
508 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
511 7.2. Informative References
513 [X.500] International Telecommunication Union -
514 Telecommunication Standardization Sector, "The Directory
515 -- Overview of concepts, models and services,"
516 X.500(1993) (also ISO/IEC 9594-1:1994).
518 [X.501] International Telecommunication Union -
519 Telecommunication Standardization Sector, "The Directory
520 -- Models," X.501(1993) (also ISO/IEC 9594-2:1994).
522 [X.520] International Telecommunication Union -
523 Telecommunication Standardization Sector, "The
524 Directory: Selected Attribute Types", X.520(1993) (also
525 ISO/IEC 9594-6:1994).
527 [Glossary] The Unicode Consortium, "Unicode Glossary",
528 <http://www.unicode.org/glossary/>.
530 [CharModel] Whistler, K. and M. Davis, "Unicode Technical Report
531 #17, Character Encoding Model", UTR17,
532 <http://www.unicode.org/unicode/reports/tr17/>, August
535 [XMATCH] Zeilenga, K., "Internationalized String Matching Rules
536 for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt, a
539 [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets",
543 Appendix A. Teletex (T.61) to Unicode
545 This appendix defines an algorithm for transcoding [T.61] characters
546 to [Unicode] characters for use in string preparation for LDAP
547 matching rules. This appendix is normative.
549 The transcoding algorithm is derived from the T.61-8bit definition
550 provided in [RFC1345]. With a few exceptions, the T.61 character
551 codes from x00 to x7f are equivalent to the corresponding [Unicode]
552 code points, and their values are left unchanged by this algorithm.
553 E.g. the T.61 code x20 is identical to (U+0020). The exceptions are
554 for these T.61 codes that are undefined: x23, x24, x5c, x5e, x60, x7b,
557 The codes from x80 to x9f are also equivalent to the corresponding
558 Unicode code points. This is specified for completeness only, as
562 Zeilenga LDAPprep [Page 10]
564 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
567 these codes are control characters, and will be mapped to nothing in
568 the LDAP String Preparation Mapping step.
570 The remaining T.61 codes are mapped below in Table A.1. Table
571 positions marked "??" are undefined.
573 Input strings containing undefined T.61 codes SHALL produce an
574 Undefined matching result. For diagnostic purposes, this algorithm
575 does not fail for undefined input codes. Instead, undefined codes in
576 the input are mapped to the Unicode REPLACEMENT CHARACTER (U+FFFD).
577 As the LDAP String Preparation Prohibit step disallows the REPLACEMENT
578 CHARACTER from appearing in its output, this transcoding yields the
581 Note: RFC 1345 listed the non-spacing accent codepoints as residing in
582 the range starting at (U+E000). In the current Unicode
583 standard, the (U+E000) range is reserved for Private Use, and
584 the non-spacing accents are in the range starting at (U+0300).
585 The tables here use the (U+0300) range for these accents.
587 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
588 --+------+------+------+------+------+------+------+------+
589 a0| 00a0 | 00a1 | 00a2 | 00a3 | 0024 | 00a5 | 0023 | 00a7 |
590 a8| 00a8 | ?? | ?? | 00ab | ?? | ?? | ?? | ?? |
591 b0| 00b0 | 00b1 | 00b2 | 00b3 | 00d7 | 00b5 | 00b6 | 00b7 |
592 b8| 00f7 | ?? | ?? | 00bb | 00bc | 00bd | 00be | 00bf |
593 c0| ?? | 0300 | 0301 | 0302 | 0303 | 0304 | 0306 | 0307 |
594 c8| 0308 | ?? | 030a | 0327 | 0332 | 030b | 0328 | 030c |
595 d0| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
596 d8| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
597 e0| 2126 | 00c6 | 00d0 | 00aa | ?? | 0126 | 0132 | 013f |
598 e8| 0141 | 00d8 | 0152 | 00ba | 00de | 0166 | 014a | 0149 |
599 f0| 0138 | 00e6 | 0111 | 00f0 | 0127 | 0131 | 0133 | 0140 |
600 f8| 0142 | 00f8 | 0153 | 00df | 00fe | 0167 | 014b | ?? |
601 --+------+------+------+------+------+------+------+------+
602 Table A.1: Mapping of 8-bit T.61 codes to Unicode
604 T.61 also defines a number of accented characters that are formed by
605 combining an accent prefix followed by a base character. These
606 prefixes are in the code range xc1 to xcf. If a prefix character
607 appears at the end of a string, the result is undefined. Otherwise
608 these sequences are mapped to Unicode by substituting the
609 corresponding non-spacing accent code (as listed in Table A.1) for the
610 accent prefix, and exchanging the order so that the base character
614 Appendix B. Additional Teletex (T.61) to Unicode Tables
618 Zeilenga LDAPprep [Page 11]
620 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
623 All of the accented characters in T.61 have a corresponding code point
624 in Unicode. For the sake of completeness, the combined character
625 codes are presented in the following tables. This is informational
626 only; for matching purposes it is sufficient to map the non-spacing
627 accent and exchange the order of the character pair as specified in
628 Appendix A. This appendix is informative.
631 B.1. Combinations with SPACE
633 Accents may be combined with a <SPACE> to generate the accent by
634 itself. For each accent code, the result of combining with <SPACE> is
637 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
638 --+------+------+------+------+------+------+------+------+
639 c0| ?? | 0060 | 00b4 | 005e | 007e | 00af | 02d8 | 02d9 |
640 c8| 00a8 | ?? | 02da | 00b8 | ?? | 02dd | 02db | 02c7 |
641 --+------+------+------+------+------+------+------+------+
642 Table B.1: Mapping of T.61 Accents with <SPACE> to Unicode
645 B.2. Combinations for xc1: (Grave accent)
647 T.61 has predefined characters for combinations with A, E, I, O, and
648 U. Unicode also defines combinations for N, W, and Y. All of these
649 combinations are present in Table B.2.
651 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
652 --+------+------+------+------+------+------+------+------+
653 40| ?? | 00c0 | ?? | ?? | ?? | 00c8 | ?? | ?? |
654 48| ?? | 00cc | ?? | ?? | ?? | ?? | 01f8 | 00d2 |
655 50| ?? | ?? | ?? | ?? | ?? | 00d9 | ?? | 1e80 |
656 58| ?? | 1ef2 | ?? | ?? | ?? | ?? | ?? | ?? |
657 60| ?? | 00e0 | ?? | ?? | ?? | 00e8 | ?? | ?? |
658 68| ?? | 00ec | ?? | ?? | ?? | ?? | 01f9 | 00f2 |
659 70| ?? | ?? | ?? | ?? | ?? | 00f9 | ?? | 1e81 |
660 78| ?? | 1ef3 | ?? | ?? | ?? | ?? | ?? | ?? |
661 --+------+------+------+------+------+------+------+------+
662 Table B.2: Mapping of T.61 Grave Accent Combinations
665 B.3. Combinations for xc2: (Acute accent)
667 T.61 has predefined characters for combinations with A, E, I, O, U, Y,
668 C, L, N, R, S, and Z. Unicode also defines G, K, M, P, and W. All of
669 these combinations are present in Table B.3.
674 Zeilenga LDAPprep [Page 12]
676 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
679 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
680 --+------+------+------+------+------+------+------+------+
681 40| ?? | 00c1 | ?? | 0106 | ?? | 00c9 | ?? | 01f4 |
682 48| ?? | 00cd | ?? | 1e30 | 0139 | 1e3e | 0143 | 00d3 |
683 50| 1e54 | ?? | 0154 | 015a | ?? | 00da | ?? | 1e82 |
684 58| ?? | 00dd | 0179 | ?? | ?? | ?? | ?? | ?? |
685 60| ?? | 00e1 | ?? | 0107 | ?? | 00e9 | ?? | 01f5 |
686 68| ?? | 00ed | ?? | 1e31 | 013a | 1e3f | 0144 | 00f3 |
687 70| 1e55 | ?? | 0155 | 015b | ?? | 00fa | ?? | 1e83 |
688 78| ?? | 00fd | 017a | ?? | ?? | ?? | ?? | ?? |
689 --+------+------+------+------+------+------+------+------+
690 Table B.3: Mapping of T.61 Acute Accent Combinations
693 B.4. Combinations for xc3: (Circumflex)
695 T.61 has predefined characters for combinations with A, E, I, O, U, Y,
696 C, G, H, J, S, and W. Unicode also defines the combination for Z.
697 All of these combinations are present in Table B.4.
699 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
700 --+------+------+------+------+------+------+------+------+
701 40| ?? | 00c2 | ?? | 0108 | ?? | 00ca | ?? | 011c |
702 48| 0124 | 00ce | 0134 | ?? | ?? | ?? | ?? | 00d4 |
703 50| ?? | ?? | ?? | 015c | ?? | 00db | ?? | 0174 |
704 58| ?? | 0176 | 1e90 | ?? | ?? | ?? | ?? | ?? |
705 60| ?? | 00e2 | ?? | 0109 | ?? | 00ea | ?? | 011d |
706 68| 0125 | 00ee | 0135 | ?? | ?? | ?? | ?? | 00f4 |
707 70| ?? | ?? | ?? | 015d | ?? | 00fb | ?? | 0175 |
708 78| ?? | 0177 | 1e91 | ?? | ?? | ?? | ?? | ?? |
709 --+------+------+------+------+------+------+------+------+
710 Table B.4: Mapping of T.61 Circumflex Accent Combinations
713 B.5. Combinations for xc4: (Tilde)
715 T.61 has predefined characters for combinations with A, I, O, U, and
716 N. Unicode also defines E, V, and Y. All of these combinations are
717 present in Table B.5.
719 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
720 --+------+------+------+------+------+------+------+------+
721 40| ?? | 00c3 | ?? | ?? | ?? | 1ebc | ?? | ?? |
722 48| ?? | 0128 | ?? | ?? | ?? | ?? | 00d1 | 00d5 |
723 50| ?? | ?? | ?? | ?? | ?? | 0168 | 1e7c | ?? |
724 58| ?? | 1ef8 | ?? | ?? | ?? | ?? | ?? | ?? |
725 60| ?? | 00e3 | ?? | ?? | ?? | 1ebd | ?? | ?? |
726 68| ?? | 0129 | ?? | ?? | ?? | ?? | 00f1 | 00f5 |
730 Zeilenga LDAPprep [Page 13]
732 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
735 70| ?? | ?? | ?? | ?? | ?? | 0169 | 1e7d | ?? |
736 78| ?? | 1ef9 | ?? | ?? | ?? | ?? | ?? | ?? |
737 --+------+------+------+------+------+------+------+------+
738 Table B.5: Mapping of T.61 Tilde Accent Combinations
741 B.6. Combinations for xc5: (Macron)
743 T.61 has predefined characters for combinations with A, E, I, O, and
744 U. Unicode also defines Y, G, and AE. All of these combinations are
745 present in Table B.6.
747 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
748 --+------+------+------+------+------+------+------+------+
749 40| ?? | 0100 | ?? | ?? | ?? | 0112 | ?? | 1e20 |
750 48| ?? | 012a | ?? | ?? | ?? | ?? | ?? | 014c |
751 50| ?? | ?? | ?? | ?? | ?? | 016a | ?? | ?? |
752 58| ?? | 0232 | ?? | ?? | ?? | ?? | ?? | ?? |
753 60| ?? | 0101 | ?? | ?? | ?? | 0113 | ?? | 1e21 |
754 68| ?? | 012b | ?? | ?? | ?? | ?? | ?? | 014d |
755 70| ?? | ?? | ?? | ?? | ?? | 016b | ?? | ?? |
756 78| ?? | 0233 | ?? | ?? | ?? | ?? | ?? | ?? |
757 e0| ?? | 01e2 | ?? | ?? | ?? | ?? | ?? | ?? |
758 f0| ?? | 01e3 | ?? | ?? | ?? | ?? | ?? | ?? |
759 --+------+------+------+------+------+------+------+------+
760 Table B.6: Mapping of T.61 Macron Accent Combinations
763 B.7. Combinations for xc6: (Breve)
765 T.61 has predefined characters for combinations with A, U, and G.
766 Unicode also defines E, I, and O. All of these combinations are
767 present in Table B.7.
769 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
770 --+------+------+------+------+------+------+------+------+
771 40| ?? | 0102 | ?? | ?? | ?? | 0114 | ?? | 011e |
772 48| ?? | 012c | ?? | ?? | ?? | ?? | ?? | 014e |
773 50| ?? | ?? | ?? | ?? | ?? | 016c | ?? | ?? |
774 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
775 60| ?? | 0103 | ?? | ?? | ?? | 0115 | ?? | 011f |
776 68| ?? | 012d | ?? | ?? | ?? | ?? | 00f1 | 014f |
777 70| ?? | ?? | ?? | ?? | ?? | 016d | ?? | ?? |
778 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
779 --+------+------+------+------+------+------+------+------+
780 Table B.7: Mapping of T.61 Breve Accent Combinations
786 Zeilenga LDAPprep [Page 14]
788 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
791 B.8. Combinations for xc7: (Dot Above)
793 T.61 has predefined characters for C, E, G, I, and Z. Unicode also
794 defines A, O, B, D, F, H, M, N, P, R, S, T, W, X, and Y. All of these
795 combinations are present in Table B.8.
797 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
798 --+------+------+------+------+------+------+------+------+
799 40| ?? | 0226 | 1e02 | 010a | 1e0a | 0116 | 1e1e | 0120 |
800 48| 1e22 | 0130 | ?? | ?? | ?? | 1e40 | 1e44 | 022e |
801 50| 1e56 | ?? | 1e58 | 1e60 | 1e6a | ?? | ?? | 1e86 |
802 58| 1e8a | 1e8e | 017b | ?? | ?? | ?? | ?? | ?? |
803 60| ?? | 0227 | 1e03 | 010b | 1e0b | 0117 | 1e1f | 0121 |
804 68| 1e23 | ?? | ?? | ?? | ?? | 1e41 | 1e45 | 022f |
805 70| 1e57 | ?? | 1e59 | 1e61 | 1e6b | ?? | ?? | 1e87 |
806 78| 1e8b | 1e8f | 017c | ?? | ?? | ?? | ?? | ?? |
807 --+------+------+------+------+------+------+------+------+
808 Table B.8: Mapping of T.61 Dot Above Accent Combinations
811 B.9. Combinations for xc8: (Diaeresis)
813 T.61 has predefined characters for A, E, I, O, U, and Y. Unicode also
814 defines H, W, X, and t. All of these combinations are present in
817 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
818 --+------+------+------+------+------+------+------+------+
819 40| ?? | 00c4 | ?? | ?? | ?? | 00cb | ?? | ?? |
820 48| 1e26 | 00cf | ?? | ?? | ?? | ?? | ?? | 00d6 |
821 50| ?? | ?? | ?? | ?? | ?? | 00dc | ?? | 1e84 |
822 58| 1e8c | 0178 | ?? | ?? | ?? | ?? | ?? | ?? |
823 60| ?? | 00e4 | ?? | ?? | ?? | 00eb | ?? | ?? |
824 68| 1e27 | 00ef | ?? | ?? | ?? | ?? | ?? | 00f6 |
825 70| ?? | ?? | ?? | ?? | 1e97 | 00fc | ?? | 1e85 |
826 78| 1e8d | 00ff | ?? | ?? | ?? | ?? | ?? | ?? |
827 --+------+------+------+------+------+------+------+------+
828 Table B.8: Mapping of T.61 Diaeresis Accent Combinations
831 B.10. Combinations for xca: (Ring Above)
833 T.61 has predefined characters for A, and U. Unicode also defines w
834 and y. All of these combinations are present in Table B.10.
836 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
837 --+------+------+------+------+------+------+------+------+
838 40| ?? | 00c5 | ?? | ?? | ?? | ?? | ?? | ?? |
842 Zeilenga LDAPprep [Page 15]
844 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
847 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
848 50| ?? | ?? | ?? | ?? | ?? | 016e | ?? | ?? |
849 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
850 60| ?? | 00e5 | ?? | ?? | ?? | ?? | ?? | ?? |
851 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
852 70| ?? | ?? | ?? | ?? | ?? | 016f | ?? | 1e98 |
853 78| ?? | 1e99 | ?? | ?? | ?? | ?? | ?? | ?? |
854 --+------+------+------+------+------+------+------+------+
855 Table B.10: Mapping of T.61 Ring Above Accent Combinations
858 B.11. Combinations for xcb: (Cedilla)
860 T.61 has predefined characters for C, G, K, L, N, R, S, and T.
861 Unicode also defines E, D, and H. All of these combinations are
862 present in Table B.11.
864 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
865 --+------+------+------+------+------+------+------+------+
866 40| ?? | ?? | ?? | 00c7 | 1e10 | 0228 | ?? | 0122 |
867 48| 1e28 | ?? | ?? | 0136 | 013b | ?? | 0145 | ?? |
868 50| ?? | ?? | 0156 | 015e | 0162 | ?? | ?? | ?? |
869 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
870 60| ?? | ?? | ?? | 00e7 | 1e11 | 0229 | ?? | 0123 |
871 68| 1e29 | ?? | ?? | 0137 | 013c | ?? | 0146 | ?? |
872 70| ?? | ?? | 0157 | 015f | 0163 | ?? | ?? | ?? |
873 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
874 --+------+------+------+------+------+------+------+------+
875 Table B.11: Mapping of T.61 Cedilla Accent Combinations
878 B.12. Combinations for xcd: (Double Acute Accent)
880 T.61 has predefined characters for O, and U. These combinations are
881 present in Table B.12.
883 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
884 --+------+------+------+------+------+------+------+------+
885 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0150 |
886 50| ?? | ?? | ?? | ?? | ?? | 0170 | ?? | ?? |
887 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0151 |
888 70| ?? | ?? | ?? | ?? | ?? | 0171 | ?? | ?? |
889 --+------+------+------+------+------+------+------+------+
890 Table B.12: Mapping of T.61 Double Acute Accent Combinations
893 B.13. Combinations for xce: (Ogonek)
898 Zeilenga LDAPprep [Page 16]
900 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
903 T.61 has predefined characters for A, E, I, and U. Unicode also
904 defines the combination for O. All of these combinations are present
907 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
908 --+------+------+------+------+------+------+------+------+
909 40| ?? | 0104 | ?? | ?? | ?? | 0118 | ?? | ?? |
910 48| ?? | 012e | ?? | ?? | ?? | ?? | ?? | 01ea |
911 50| ?? | ?? | ?? | ?? | ?? | 0172 | ?? | ?? |
912 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
913 60| ?? | 0105 | ?? | ?? | ?? | 0119 | ?? | ?? |
914 68| ?? | 012f | ?? | ?? | ?? | ?? | ?? | 01eb |
915 70| ?? | ?? | ?? | ?? | ?? | 0173 | ?? | ?? |
916 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
917 --+------+------+------+------+------+------+------+------+
918 Table B.13: Mapping of T.61 Ogonek Accent Combinations
921 B.14. Combinations for xcf: (Caron)
923 T.61 has predefined characters for C, D, E, L, N, R, S, T, and Z.
924 Unicode also defines A, I, O, U, G, H, j,and K. All of these
925 combinations are present in Table B.14.
927 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
928 --+------+------+------+------+------+------+------+------+
929 40| ?? | 01cd | ?? | 010c | 010e | 011a | ?? | 01e6 |
930 48| 021e | 01cf | ?? | 01e8 | 013d | ?? | 0147 | 01d1 |
931 50| ?? | ?? | 0158 | 0160 | 0164 | 01d3 | ?? | ?? |
932 58| ?? | ?? | 017d | ?? | ?? | ?? | ?? | ?? |
933 60| ?? | 01ce | ?? | 010d | 010f | 011b | ?? | 01e7 |
934 68| 021f | 01d0 | 01f0 | 01e9 | 013e | ?? | 0148 | 01d2 |
935 70| ?? | ?? | 0159 | 0161 | 0165 | 01d4 | ?? | ?? |
936 78| ?? | ?? | 017e | ?? | ?? | ?? | ?? | ?? |
937 --+------+------+------+------+------+------+------+------+
938 Table B.14: Mapping of T.61 Caron Accent Combinations
942 Intellectual Property Rights
944 The IETF takes no position regarding the validity or scope of any
945 intellectual property or other rights that might be claimed to pertain
946 to the implementation or use of the technology described in this
947 document or the extent to which any license under such rights might or
948 might not be available; neither does it represent that it has made any
949 effort to identify any such rights. Information on the IETF's
950 procedures with respect to rights in standards-track and
954 Zeilenga LDAPprep [Page 17]
956 Internet-Draft draft-ietf-ldapbis-strprep-02 27 October 2003
959 standards-related documentation can be found in BCP-11. Copies of
960 claims of rights made available for publication and any assurances of
961 licenses to be made available, or the result of an attempt made to
962 obtain a general license or permission for the use of such proprietary
963 rights by implementors or users of this specification can be obtained
964 from the IETF Secretariat.
966 The IETF invites any interested party to bring to its attention any
967 copyrights, patents or patent applications, or other proprietary
968 rights which may cover technology that may be required to practice
969 this standard. Please address the information to the IETF Executive
976 Copyright (C) The Internet Society (2003). All Rights Reserved.
978 This document and translations of it may be copied and furnished to
979 others, and derivative works that comment on or otherwise explain it
980 or assist in its implmentation may be prepared, copied, published and
981 distributed, in whole or in part, without restriction of any kind,
982 provided that the above copyright notice and this paragraph are
983 included on all such copies and derivative works. However, this
984 document itself may not be modified in any way, such as by removing
985 the copyright notice or references to the Internet Society or other
986 Internet organizations, except as needed for the purpose of
987 developing Internet standards in which case the procedures for
988 copyrights defined in the Internet Standards process must be followed,
989 or as required to translate it into languages other than English.
1010 Zeilenga LDAPprep [Page 18]