7 Internet-Draft Kurt D. Zeilenga
8 Intended Category: Standard Track OpenLDAP Foundation
9 Expires in six months 15 February 2004
13 LDAP: Internationalized String Preparation
14 <draft-ietf-ldapbis-strprep-03.txt>
19 This document is an Internet-Draft and is in full conformance with all
20 provisions of Section 10 of RFC2026.
22 Distribution of this memo is unlimited. Technical discussion of this
23 document will take place on the IETF LDAP Revision Working Group
24 mailing list <ietf-ldapbis@openldap.org>. Please send editorial
25 comments directly to the author <Kurt@OpenLDAP.org>.
27 Internet-Drafts are working documents of the Internet Engineering Task
28 Force (IETF), its areas, and its working groups. Note that other
29 groups may also distribute working documents as Internet-Drafts.
30 Internet-Drafts are draft documents valid for a maximum of six months
31 and may be updated, replaced, or obsoleted by other documents at any
32 time. It is inappropriate to use Internet-Drafts as reference
33 material or to cite them other than as ``work in progress.''
35 The list of current Internet-Drafts can be accessed at
36 <http://www.ietf.org/ietf/1id-abstracts.txt>. The list of
37 Internet-Draft Shadow Directories can be accessed at
38 <http://www.ietf.org/shadow.html>.
40 Copyright (C) The Internet Society (2004). All Rights Reserved.
42 Please see the Full Copyright section near the end of this document
48 The previous Lightweight Directory Access Protocol (LDAP) technical
49 specifications did not precisely define how character string matching
50 is to be performed. This led to a number of usability and
51 interoperability problems. This document defines string preparation
52 algorithms for character-based matching rules defined for use in LDAP.
58 Zeilenga LDAPprep [Page 1]
60 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
67 document are to be interpreted as described in BCP 14 [RFC2119].
69 Character names in this document use the notation for code points and
70 names from the Unicode Standard [Unicode]. For example, the letter
71 "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
72 In the lists of mappings and the prohibited characters, the "U+" is
73 left off to make the lists easier to read. The comments for character
74 ranges are shown in square brackets (such as "[CONTROL CHARACTERS]")
75 and do not come from the standard.
77 Note: a glossary of terms used in Unicode can be found in [Glossary].
78 Information on the Unicode character encoding model can be found in
86 A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule
87 [Syntaxes] defines an algorithm for determining whether a presented
88 value matches an attribute value in accordance with the criteria
89 defined for the rule. The proposition may be evaluated to True,
92 True - the attribute contains a matching value,
94 False - the attribute contains no matching value,
96 Undefined - it cannot be determined whether the attribute contains
97 a matching value or not.
99 For instance, the caseIgnoreMatch matching rule may be used to compare
100 whether the commonName attribute contains a particular value without
101 regard for case and insignificant spaces.
104 1.2. X.500 String Matching Rules
106 "X.520: Selected attribute types" [X.520] provides (amongst other
107 things) value syntaxes and matching rules for comparing values
108 commonly used in the Directory. These specifications are inadequate
109 for strings composed of Unicode [Unicode] characters.
114 Zeilenga LDAPprep [Page 2]
116 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
119 The caseIgnoreMatch matching rule [X.520], for example, is simply
120 defined as being a case insensitive comparison where insignificant
121 spaces are ignored. For printableString, there is only one space
122 character and case mapping is bijective, hence this definition is
123 sufficient. However, for Unicode string types such as
124 universalString, this is not sufficient. For example, a case
125 insensitive matching implementation which folded lower case characters
126 to upper case would yield different different results than an
127 implementation which used upper case to lower case folding. Or one
128 implementation may view space as referring to only SPACE (U+0020), a
129 second implementation may view any character with the space separator
130 (Zs) property as a space, and another implementation may view any
131 character with the whitespace (WS) category as a space.
133 The lack of precise specification for character string matching has
134 led to significant interoperability problems. When used in
135 certificate chain validation, security vulnerabilities can arise. To
136 address these problems, this document defines precise algorithms for
137 preparing character strings for matching.
140 1.3. Relationship to "stringprep"
142 The character string preparation algorithms described in this document
143 are based upon the "stringprep" approach [StringPrep]. In
144 "stringprep", presented and stored values are first prepared for
145 comparison and so that a character-by-character comparison yields the
148 The approach used here is a refinement of the "stringprep"
149 [StringPrep] approach. Each algorithm involves two additional
152 a) prior to applying the Unicode string preparation steps outlined in
153 "stringprep", the string is transcoded to Unicode;
155 b) after applying the Unicode string preparation steps outlined in
156 "stringprep", characters insignificant to the matching rules are
159 Hence, preparation of character strings for X.500 matching involves
166 5) Check Bidi (Bidirectional)
170 Zeilenga LDAPprep [Page 3]
172 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
175 6) Insignificant Character Removal
177 These steps are described in Section 2.
180 1.4. Relationship to the LDAP Technical Specification
182 This document is a integral part of the LDAP technical specification
183 [Roadmap] which obsoletes the previously defined LDAP technical
184 specification [RFC3377] in its entirety.
186 This document details new LDAP internationalized character string
187 preparation algorithms used by [Syntaxes] and possible other technical
188 specifications defining LDAP syntaxes and/or matching rules.
191 1.5. Relationship to X.500
193 LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism.
194 As such, there is a strong desire for alignment between LDAP and X.500
195 syntax and semantics. The character string preparation algorithms
196 described in this document are based upon "Internationalized String
197 Matching Rules for X.500" [XMATCH] proposal to ITU/ISO Joint Study
201 2. String Preparation
203 The following six-step process SHALL be applied to each presented and
204 attribute value in preparation for character string matching rule
212 6) Insignificant Character Removal
214 Failure in any step causes the assertion to evaluate to Undefined.
216 This process is intended to act upon non-empty character strings. If
217 the string to prepare is empty, this process is not applied and the
218 assertion is evaluated to Undefined.
220 The character repertoire of this process is Unicode 3.2 [Unicode].
226 Zeilenga LDAPprep [Page 4]
228 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
233 Each non-Unicode string value is transcoded to Unicode.
235 TeletexString [X.680][T.61] values are transcoded to Unicode as
236 described in Appendix A.
238 PrintableString [X.680] value are transcoded directly to Unicode.
240 UniversalString, UTF8String, and bmpString [X.680] values need not be
241 transcoded as they are Unicode-based strings (in the case of
242 bmpString, a subset of Unicode).
244 The output is the transcoded string.
249 SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
250 points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and
251 VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also
252 mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
255 CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
256 TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
257 (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
259 All other control code points (e.g., Cc) or code points with a control
260 function (e.g., Cf) are mapped to nothing.
262 ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code points
263 with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or
264 Zp) are mapped to SPACE (U+0020).
266 Appendix B provides a table detailing the above mappings.
268 For case ignore, numeric, and stored prefix string matching rules,
269 characters are case folded per B.2 of [StringPrep].
271 The output is the mapped string.
276 The input string is be normalized to Unicode Form KC (compatibility
277 composed) as described in [UAX15]. The output is the normalized
282 Zeilenga LDAPprep [Page 5]
284 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
289 All Unassigned code points are prohibited. Unassigned code points are
290 listed in Table A.1 of [StringPrep].
292 Characters which, per Section 5.8 of [Stringprep], change display
293 properties or are deprecated are prohibited. These characters are are
294 listed in Table C.8 of [StringPrep].
296 Private Use (U+E000-F8FF, F0000-FFFFD, 100000-10FFFD) code points are
299 All non-character code points (U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF,
300 2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF,
301 7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF,
302 CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, 10FFFE-10FFFF) are
305 Surrogate codes (U+D800-DFFFF) are prohibited.
307 The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
309 The step fails if the input string contains any prohibited code point.
310 Otherwise, the output is the input string.
315 This step fails if the input string does not conform to the the
316 bidirectional character restrictions detailed in 6 of [Stringprep].
317 Otherwise, the output is the input string.
320 2.6. Insignificant Character Removal
322 In this step, characters insignificant to the matching rule are to be
323 removed. The characters to be removed differ from matching rule to
326 Section 2.6.1 applies to case ignore and exact string matching.
327 Section 2.6.2 applies to numericString matching.
328 Section 2.6.3 applies to telephoneNumber matching.
331 2.6.1. Insignificant Space Removal
333 For the purposes of this section, a space is defined to be the SPACE
334 (U+0020) code point followed by no combining marks.
338 Zeilenga LDAPprep [Page 6]
340 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
343 NOTE - The previous steps ensure that the string cannot contain any
344 code points in the separator class, other than SPACE (U+0020).
346 If the input string consists entirely of spaces or is empty, the
347 output is a string consisting of exactly one space (e.g. " ").
349 Otherwise, the following spaces are removed:
350 - leading spaces (i.e. those preceding the first character that is
352 - trailing spaces (i.e. those following the last character that is
354 - multiple consecutive spaces (these are taken as equivalent to a
355 single space character).
357 For example, removal of spaces from the Form KC string:
358 "<SPACE><SPACE>foo<SPACE><SPACE>bar<SPACE><SPACE>"
359 would result in the output string:
361 and the Form KC string:
362 "<SPACE><SPACE><SPACE>"
363 would result in the output string:
367 2.6.2. numericString Insignificant Character Removal
369 For the purposes of this section, a space is defined to be the SPACE
370 (U+0020) code point followed by no combining marks.
372 All spaces are regarded as not significant. If the input string
373 consists entirely of spaces or is empty, the output is a string
374 consisting of exactly one space (e.g. " "). Otherwise, all spaces are
377 For example, removal of spaces from the Form KC string:
378 "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>"
379 would result in the output string:
381 and the Form KC string:
382 "<SPACE><SPACE><SPACE>"
383 would result in the output string:
387 2.6.3. telephoneNumber Insignificant Character Removal
389 For the purposes of this section, a hyphen is defined to be
390 HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
394 Zeilenga LDAPprep [Page 7]
396 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
399 NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
400 (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no
401 combining marks and a space is defined to be the SPACE (U+0020) code
402 point followed by no combining marks.
404 All hyphens and spaces are considered insignificant. If the string
405 contains only spaces and hyphens or is empty, then the output is a
406 string consisting of one space. Otherwise, all hyphens and spaces are
409 For example, removal of hyphens and spaces from the Form KC string:
410 "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>"
411 would result in the output string:
413 and the Form KC string:
414 "<HYPHEN><HYPHEN><HYPHEN>"
415 would result in the output string:
419 3. Security Considerations
421 "Preparation for International Strings ('stringprep')" [StringPrep]
422 security considerations generally apply to the algorithms described
428 Appendix A and B of this document were authored by Howard Chu
429 <hyc@symas.com> of Symas Corporation (based upon information provided
435 The approach used in this document is based upon design principles and
436 algorithms described in "Preparation of Internationalized Strings
437 ('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet. Some
438 additional guidance was drawn from Unicode Technical Standards,
439 Technical Reports, and Notes.
441 This document is a product of the IETF LDAP Revision (LDAPBIS) Working
450 Zeilenga LDAPprep [Page 8]
452 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
458 Email: Kurt@OpenLDAP.org
463 7.1. Normative References
465 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
466 Requirement Levels", BCP 14 (also RFC 2119), March 1997.
468 [Roadmap] Zeilenga, K. (editor), "LDAP: Technical Specification
469 Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in
472 [StringPrep] Hoffman P. and M. Blanchet, "Preparation of
473 Internationalized Strings ('stringprep')",
474 draft-hoffman-rfc3454bis-xx.txt, a work in progress.
476 [Syntaxes] Legg, S. (editor), "LDAP: Syntaxes and Matching Rules",
477 draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress.
479 [Unicode] The Unicode Consortium, "The Unicode Standard, Version
480 3.2.0" is defined by "The Unicode Standard, Version 3.0"
481 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
482 as amended by the "Unicode Standard Annex #27: Unicode
483 3.1" (http://www.unicode.org/reports/tr27/) and by the
484 "Unicode Standard Annex #28: Unicode 3.2"
485 (http://www.unicode.org/reports/tr28/).
487 [UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15:
488 Unicode Normalization Forms, Version 3.2.0".
489 <http://www.unicode.org/unicode/reports/tr15/tr15-22.html>,
492 [X.680] International Telecommunication Union -
493 Telecommunication Standardization Sector, "Abstract
494 Syntax Notation One (ASN.1) - Specification of Basic
495 Notation", X.680(1997) (also ISO/IEC 8824-1:1998).
497 [T.61] CCITT (now ITU), "Character Repertoire and Coded
498 Character Sets for the International Teletex Service",
501 7.2. Informative References
506 Zeilenga LDAPprep [Page 9]
508 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
511 [X.500] International Telecommunication Union -
512 Telecommunication Standardization Sector, "The Directory
513 -- Overview of concepts, models and services,"
514 X.500(1993) (also ISO/IEC 9594-1:1994).
516 [X.501] International Telecommunication Union -
517 Telecommunication Standardization Sector, "The Directory
518 -- Models," X.501(1993) (also ISO/IEC 9594-2:1994).
520 [X.520] International Telecommunication Union -
521 Telecommunication Standardization Sector, "The
522 Directory: Selected Attribute Types", X.520(1993) (also
523 ISO/IEC 9594-6:1994).
525 [Glossary] The Unicode Consortium, "Unicode Glossary",
526 <http://www.unicode.org/glossary/>.
528 [CharModel] Whistler, K. and M. Davis, "Unicode Technical Report
529 #17, Character Encoding Model", UTR17,
530 <http://www.unicode.org/unicode/reports/tr17/>, August
533 [XMATCH] Zeilenga, K., "Internationalized String Matching Rules
534 for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt, a
537 [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets",
541 Appendix A. Teletex (T.61) to Unicode
543 This appendix defines an algorithm for transcoding [T.61] characters
544 to [Unicode] characters for use in string preparation for LDAP
545 matching rules. This appendix is normative.
547 The transcoding algorithm is derived from the T.61-8bit definition
548 provided in [RFC1345]. With a few exceptions, the T.61 character
549 codes from x00 to x7f are equivalent to the corresponding [Unicode]
550 code points, and their values are left unchanged by this algorithm.
551 E.g. the T.61 code x20 is identical to (U+0020). The exceptions are
552 for these T.61 codes that are undefined: x23, x24, x5c, x5e, x60, x7b,
555 The codes from x80 to x9f are also equivalent to the corresponding
556 Unicode code points. This is specified for completeness only, as
557 these codes are control characters, and will be mapped to nothing in
558 the LDAP String Preparation Mapping step.
562 Zeilenga LDAPprep [Page 10]
564 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
567 The remaining T.61 codes are mapped below in Table A.1. Table
568 positions marked "??" are undefined.
570 Input strings containing undefined T.61 codes SHALL produce an
571 Undefined matching result. For diagnostic purposes, this algorithm
572 does not fail for undefined input codes. Instead, undefined codes in
573 the input are mapped to the Unicode REPLACEMENT CHARACTER (U+FFFD).
574 As the LDAP String Preparation Prohibit step disallows the REPLACEMENT
575 CHARACTER from appearing in its output, this transcoding yields the
578 Note: RFC 1345 listed the non-spacing accent codepoints as residing in
579 the range starting at (U+E000). In the current Unicode
580 standard, the (U+E000) range is reserved for Private Use, and
581 the non-spacing accents are in the range starting at (U+0300).
582 The tables here use the (U+0300) range for these accents.
584 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
585 --+------+------+------+------+------+------+------+------+
586 a0| 00a0 | 00a1 | 00a2 | 00a3 | 0024 | 00a5 | 0023 | 00a7 |
587 a8| 00a8 | ?? | ?? | 00ab | ?? | ?? | ?? | ?? |
588 b0| 00b0 | 00b1 | 00b2 | 00b3 | 00d7 | 00b5 | 00b6 | 00b7 |
589 b8| 00f7 | ?? | ?? | 00bb | 00bc | 00bd | 00be | 00bf |
590 c0| ?? | 0300 | 0301 | 0302 | 0303 | 0304 | 0306 | 0307 |
591 c8| 0308 | ?? | 030a | 0327 | 0332 | 030b | 0328 | 030c |
592 d0| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
593 d8| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
594 e0| 2126 | 00c6 | 00d0 | 00aa | ?? | 0126 | 0132 | 013f |
595 e8| 0141 | 00d8 | 0152 | 00ba | 00de | 0166 | 014a | 0149 |
596 f0| 0138 | 00e6 | 0111 | 00f0 | 0127 | 0131 | 0133 | 0140 |
597 f8| 0142 | 00f8 | 0153 | 00df | 00fe | 0167 | 014b | ?? |
598 --+------+------+------+------+------+------+------+------+
599 Table A.1: Mapping of 8-bit T.61 codes to Unicode
601 T.61 also defines a number of accented characters that are formed by
602 combining an accent prefix followed by a base character. These
603 prefixes are in the code range xc1 to xcf. If a prefix character
604 appears at the end of a string, the result is undefined. Otherwise
605 these sequences are mapped to Unicode by substituting the
606 corresponding non-spacing accent code (as listed in Table A.1) for the
607 accent prefix, and exchanging the order so that the base character
611 Appendix B. Additional Teletex (T.61) to Unicode Tables
613 All of the accented characters in T.61 have a corresponding code point
614 in Unicode. For the sake of completeness, the combined character
618 Zeilenga LDAPprep [Page 11]
620 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
623 codes are presented in the following tables. This is informational
624 only; for matching purposes it is sufficient to map the non-spacing
625 accent and exchange the order of the character pair as specified in
626 Appendix A. This appendix is informative.
629 B.1. Combinations with SPACE
631 Accents may be combined with a <SPACE> to generate the accent by
632 itself. For each accent code, the result of combining with <SPACE> is
635 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
636 --+------+------+------+------+------+------+------+------+
637 c0| ?? | 0060 | 00b4 | 005e | 007e | 00af | 02d8 | 02d9 |
638 c8| 00a8 | ?? | 02da | 00b8 | ?? | 02dd | 02db | 02c7 |
639 --+------+------+------+------+------+------+------+------+
640 Table B.1: Mapping of T.61 Accents with <SPACE> to Unicode
643 B.2. Combinations for xc1: (Grave accent)
645 T.61 has predefined characters for combinations with A, E, I, O, and
646 U. Unicode also defines combinations for N, W, and Y. All of these
647 combinations are present in Table B.2.
649 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
650 --+------+------+------+------+------+------+------+------+
651 40| ?? | 00c0 | ?? | ?? | ?? | 00c8 | ?? | ?? |
652 48| ?? | 00cc | ?? | ?? | ?? | ?? | 01f8 | 00d2 |
653 50| ?? | ?? | ?? | ?? | ?? | 00d9 | ?? | 1e80 |
654 58| ?? | 1ef2 | ?? | ?? | ?? | ?? | ?? | ?? |
655 60| ?? | 00e0 | ?? | ?? | ?? | 00e8 | ?? | ?? |
656 68| ?? | 00ec | ?? | ?? | ?? | ?? | 01f9 | 00f2 |
657 70| ?? | ?? | ?? | ?? | ?? | 00f9 | ?? | 1e81 |
658 78| ?? | 1ef3 | ?? | ?? | ?? | ?? | ?? | ?? |
659 --+------+------+------+------+------+------+------+------+
660 Table B.2: Mapping of T.61 Grave Accent Combinations
663 B.3. Combinations for xc2: (Acute accent)
665 T.61 has predefined characters for combinations with A, E, I, O, U, Y,
666 C, L, N, R, S, and Z. Unicode also defines G, K, M, P, and W. All of
667 these combinations are present in Table B.3.
669 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
670 --+------+------+------+------+------+------+------+------+
674 Zeilenga LDAPprep [Page 12]
676 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
679 40| ?? | 00c1 | ?? | 0106 | ?? | 00c9 | ?? | 01f4 |
680 48| ?? | 00cd | ?? | 1e30 | 0139 | 1e3e | 0143 | 00d3 |
681 50| 1e54 | ?? | 0154 | 015a | ?? | 00da | ?? | 1e82 |
682 58| ?? | 00dd | 0179 | ?? | ?? | ?? | ?? | ?? |
683 60| ?? | 00e1 | ?? | 0107 | ?? | 00e9 | ?? | 01f5 |
684 68| ?? | 00ed | ?? | 1e31 | 013a | 1e3f | 0144 | 00f3 |
685 70| 1e55 | ?? | 0155 | 015b | ?? | 00fa | ?? | 1e83 |
686 78| ?? | 00fd | 017a | ?? | ?? | ?? | ?? | ?? |
687 --+------+------+------+------+------+------+------+------+
688 Table B.3: Mapping of T.61 Acute Accent Combinations
691 B.4. Combinations for xc3: (Circumflex)
693 T.61 has predefined characters for combinations with A, E, I, O, U, Y,
694 C, G, H, J, S, and W. Unicode also defines the combination for Z.
695 All of these combinations are present in Table B.4.
697 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
698 --+------+------+------+------+------+------+------+------+
699 40| ?? | 00c2 | ?? | 0108 | ?? | 00ca | ?? | 011c |
700 48| 0124 | 00ce | 0134 | ?? | ?? | ?? | ?? | 00d4 |
701 50| ?? | ?? | ?? | 015c | ?? | 00db | ?? | 0174 |
702 58| ?? | 0176 | 1e90 | ?? | ?? | ?? | ?? | ?? |
703 60| ?? | 00e2 | ?? | 0109 | ?? | 00ea | ?? | 011d |
704 68| 0125 | 00ee | 0135 | ?? | ?? | ?? | ?? | 00f4 |
705 70| ?? | ?? | ?? | 015d | ?? | 00fb | ?? | 0175 |
706 78| ?? | 0177 | 1e91 | ?? | ?? | ?? | ?? | ?? |
707 --+------+------+------+------+------+------+------+------+
708 Table B.4: Mapping of T.61 Circumflex Accent Combinations
711 B.5. Combinations for xc4: (Tilde)
713 T.61 has predefined characters for combinations with A, I, O, U, and
714 N. Unicode also defines E, V, and Y. All of these combinations are
715 present in Table B.5.
717 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
718 --+------+------+------+------+------+------+------+------+
719 40| ?? | 00c3 | ?? | ?? | ?? | 1ebc | ?? | ?? |
720 48| ?? | 0128 | ?? | ?? | ?? | ?? | 00d1 | 00d5 |
721 50| ?? | ?? | ?? | ?? | ?? | 0168 | 1e7c | ?? |
722 58| ?? | 1ef8 | ?? | ?? | ?? | ?? | ?? | ?? |
723 60| ?? | 00e3 | ?? | ?? | ?? | 1ebd | ?? | ?? |
724 68| ?? | 0129 | ?? | ?? | ?? | ?? | 00f1 | 00f5 |
725 70| ?? | ?? | ?? | ?? | ?? | 0169 | 1e7d | ?? |
726 78| ?? | 1ef9 | ?? | ?? | ?? | ?? | ?? | ?? |
730 Zeilenga LDAPprep [Page 13]
732 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
735 --+------+------+------+------+------+------+------+------+
736 Table B.5: Mapping of T.61 Tilde Accent Combinations
739 B.6. Combinations for xc5: (Macron)
741 T.61 has predefined characters for combinations with A, E, I, O, and
742 U. Unicode also defines Y, G, and AE. All of these combinations are
743 present in Table B.6.
745 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
746 --+------+------+------+------+------+------+------+------+
747 40| ?? | 0100 | ?? | ?? | ?? | 0112 | ?? | 1e20 |
748 48| ?? | 012a | ?? | ?? | ?? | ?? | ?? | 014c |
749 50| ?? | ?? | ?? | ?? | ?? | 016a | ?? | ?? |
750 58| ?? | 0232 | ?? | ?? | ?? | ?? | ?? | ?? |
751 60| ?? | 0101 | ?? | ?? | ?? | 0113 | ?? | 1e21 |
752 68| ?? | 012b | ?? | ?? | ?? | ?? | ?? | 014d |
753 70| ?? | ?? | ?? | ?? | ?? | 016b | ?? | ?? |
754 78| ?? | 0233 | ?? | ?? | ?? | ?? | ?? | ?? |
755 e0| ?? | 01e2 | ?? | ?? | ?? | ?? | ?? | ?? |
756 f0| ?? | 01e3 | ?? | ?? | ?? | ?? | ?? | ?? |
757 --+------+------+------+------+------+------+------+------+
758 Table B.6: Mapping of T.61 Macron Accent Combinations
761 B.7. Combinations for xc6: (Breve)
763 T.61 has predefined characters for combinations with A, U, and G.
764 Unicode also defines E, I, and O. All of these combinations are
765 present in Table B.7.
767 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
768 --+------+------+------+------+------+------+------+------+
769 40| ?? | 0102 | ?? | ?? | ?? | 0114 | ?? | 011e |
770 48| ?? | 012c | ?? | ?? | ?? | ?? | ?? | 014e |
771 50| ?? | ?? | ?? | ?? | ?? | 016c | ?? | ?? |
772 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
773 60| ?? | 0103 | ?? | ?? | ?? | 0115 | ?? | 011f |
774 68| ?? | 012d | ?? | ?? | ?? | ?? | 00f1 | 014f |
775 70| ?? | ?? | ?? | ?? | ?? | 016d | ?? | ?? |
776 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
777 --+------+------+------+------+------+------+------+------+
778 Table B.7: Mapping of T.61 Breve Accent Combinations
781 B.8. Combinations for xc7: (Dot Above)
786 Zeilenga LDAPprep [Page 14]
788 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
791 T.61 has predefined characters for C, E, G, I, and Z. Unicode also
792 defines A, O, B, D, F, H, M, N, P, R, S, T, W, X, and Y. All of these
793 combinations are present in Table B.8.
795 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
796 --+------+------+------+------+------+------+------+------+
797 40| ?? | 0226 | 1e02 | 010a | 1e0a | 0116 | 1e1e | 0120 |
798 48| 1e22 | 0130 | ?? | ?? | ?? | 1e40 | 1e44 | 022e |
799 50| 1e56 | ?? | 1e58 | 1e60 | 1e6a | ?? | ?? | 1e86 |
800 58| 1e8a | 1e8e | 017b | ?? | ?? | ?? | ?? | ?? |
801 60| ?? | 0227 | 1e03 | 010b | 1e0b | 0117 | 1e1f | 0121 |
802 68| 1e23 | ?? | ?? | ?? | ?? | 1e41 | 1e45 | 022f |
803 70| 1e57 | ?? | 1e59 | 1e61 | 1e6b | ?? | ?? | 1e87 |
804 78| 1e8b | 1e8f | 017c | ?? | ?? | ?? | ?? | ?? |
805 --+------+------+------+------+------+------+------+------+
806 Table B.8: Mapping of T.61 Dot Above Accent Combinations
809 B.9. Combinations for xc8: (Diaeresis)
811 T.61 has predefined characters for A, E, I, O, U, and Y. Unicode also
812 defines H, W, X, and t. All of these combinations are present in
815 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
816 --+------+------+------+------+------+------+------+------+
817 40| ?? | 00c4 | ?? | ?? | ?? | 00cb | ?? | ?? |
818 48| 1e26 | 00cf | ?? | ?? | ?? | ?? | ?? | 00d6 |
819 50| ?? | ?? | ?? | ?? | ?? | 00dc | ?? | 1e84 |
820 58| 1e8c | 0178 | ?? | ?? | ?? | ?? | ?? | ?? |
821 60| ?? | 00e4 | ?? | ?? | ?? | 00eb | ?? | ?? |
822 68| 1e27 | 00ef | ?? | ?? | ?? | ?? | ?? | 00f6 |
823 70| ?? | ?? | ?? | ?? | 1e97 | 00fc | ?? | 1e85 |
824 78| 1e8d | 00ff | ?? | ?? | ?? | ?? | ?? | ?? |
825 --+------+------+------+------+------+------+------+------+
826 Table B.8: Mapping of T.61 Diaeresis Accent Combinations
829 B.10. Combinations for xca: (Ring Above)
831 T.61 has predefined characters for A, and U. Unicode also defines w
832 and y. All of these combinations are present in Table B.10.
834 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
835 --+------+------+------+------+------+------+------+------+
836 40| ?? | 00c5 | ?? | ?? | ?? | ?? | ?? | ?? |
837 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
838 50| ?? | ?? | ?? | ?? | ?? | 016e | ?? | ?? |
842 Zeilenga LDAPprep [Page 15]
844 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
847 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
848 60| ?? | 00e5 | ?? | ?? | ?? | ?? | ?? | ?? |
849 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
850 70| ?? | ?? | ?? | ?? | ?? | 016f | ?? | 1e98 |
851 78| ?? | 1e99 | ?? | ?? | ?? | ?? | ?? | ?? |
852 --+------+------+------+------+------+------+------+------+
853 Table B.10: Mapping of T.61 Ring Above Accent Combinations
856 B.11. Combinations for xcb: (Cedilla)
858 T.61 has predefined characters for C, G, K, L, N, R, S, and T.
859 Unicode also defines E, D, and H. All of these combinations are
860 present in Table B.11.
862 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
863 --+------+------+------+------+------+------+------+------+
864 40| ?? | ?? | ?? | 00c7 | 1e10 | 0228 | ?? | 0122 |
865 48| 1e28 | ?? | ?? | 0136 | 013b | ?? | 0145 | ?? |
866 50| ?? | ?? | 0156 | 015e | 0162 | ?? | ?? | ?? |
867 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
868 60| ?? | ?? | ?? | 00e7 | 1e11 | 0229 | ?? | 0123 |
869 68| 1e29 | ?? | ?? | 0137 | 013c | ?? | 0146 | ?? |
870 70| ?? | ?? | 0157 | 015f | 0163 | ?? | ?? | ?? |
871 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
872 --+------+------+------+------+------+------+------+------+
873 Table B.11: Mapping of T.61 Cedilla Accent Combinations
876 B.12. Combinations for xcd: (Double Acute Accent)
878 T.61 has predefined characters for O, and U. These combinations are
879 present in Table B.12.
881 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
882 --+------+------+------+------+------+------+------+------+
883 48| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0150 |
884 50| ?? | ?? | ?? | ?? | ?? | 0170 | ?? | ?? |
885 68| ?? | ?? | ?? | ?? | ?? | ?? | ?? | 0151 |
886 70| ?? | ?? | ?? | ?? | ?? | 0171 | ?? | ?? |
887 --+------+------+------+------+------+------+------+------+
888 Table B.12: Mapping of T.61 Double Acute Accent Combinations
891 B.13. Combinations for xce: (Ogonek)
893 T.61 has predefined characters for A, E, I, and U. Unicode also
894 defines the combination for O. All of these combinations are present
898 Zeilenga LDAPprep [Page 16]
900 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
905 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
906 --+------+------+------+------+------+------+------+------+
907 40| ?? | 0104 | ?? | ?? | ?? | 0118 | ?? | ?? |
908 48| ?? | 012e | ?? | ?? | ?? | ?? | ?? | 01ea |
909 50| ?? | ?? | ?? | ?? | ?? | 0172 | ?? | ?? |
910 58| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
911 60| ?? | 0105 | ?? | ?? | ?? | 0119 | ?? | ?? |
912 68| ?? | 012f | ?? | ?? | ?? | ?? | ?? | 01eb |
913 70| ?? | ?? | ?? | ?? | ?? | 0173 | ?? | ?? |
914 78| ?? | ?? | ?? | ?? | ?? | ?? | ?? | ?? |
915 --+------+------+------+------+------+------+------+------+
916 Table B.13: Mapping of T.61 Ogonek Accent Combinations
919 B.14. Combinations for xcf: (Caron)
921 T.61 has predefined characters for C, D, E, L, N, R, S, T, and Z.
922 Unicode also defines A, I, O, U, G, H, j,and K. All of these
923 combinations are present in Table B.14.
925 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
926 --+------+------+------+------+------+------+------+------+
927 40| ?? | 01cd | ?? | 010c | 010e | 011a | ?? | 01e6 |
928 48| 021e | 01cf | ?? | 01e8 | 013d | ?? | 0147 | 01d1 |
929 50| ?? | ?? | 0158 | 0160 | 0164 | 01d3 | ?? | ?? |
930 58| ?? | ?? | 017d | ?? | ?? | ?? | ?? | ?? |
931 60| ?? | 01ce | ?? | 010d | 010f | 011b | ?? | 01e7 |
932 68| 021f | 01d0 | 01f0 | 01e9 | 013e | ?? | 0148 | 01d2 |
933 70| ?? | ?? | 0159 | 0161 | 0165 | 01d4 | ?? | ?? |
934 78| ?? | ?? | 017e | ?? | ?? | ?? | ?? | ?? |
935 --+------+------+------+------+------+------+------+------+
936 Table B.14: Mapping of T.61 Caron Accent Combinations
939 Appendix B -- Mapping Table
954 Zeilenga LDAPprep [Page 17]
956 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
982 Intellectual Property Rights
984 The IETF takes no position regarding the validity or scope of any
985 intellectual property or other rights that might be claimed to pertain
986 to the implementation or use of the technology described in this
987 document or the extent to which any license under such rights might or
988 might not be available; neither does it represent that it has made any
989 effort to identify any such rights. Information on the IETF's
990 procedures with respect to rights in standards-track and
991 standards-related documentation can be found in BCP-11. Copies of
992 claims of rights made available for publication and any assurances of
993 licenses to be made available, or the result of an attempt made to
994 obtain a general license or permission for the use of such proprietary
995 rights by implementors or users of this specification can be obtained
996 from the IETF Secretariat.
998 The IETF invites any interested party to bring to its attention any
999 copyrights, patents or patent applications, or other proprietary
1000 rights which may cover technology that may be required to practice
1001 this standard. Please address the information to the IETF Executive
1010 Zeilenga LDAPprep [Page 18]
1012 Internet-Draft draft-ietf-ldapbis-strprep-03 15 February 2004
1015 Copyright (C) The Internet Society (2004). All Rights Reserved.
1017 This document and translations of it may be copied and furnished to
1018 others, and derivative works that comment on or otherwise explain it
1019 or assist in its implementation may be prepared, copied, published and
1020 distributed, in whole or in part, without restriction of any kind,
1021 provided that the above copyright notice and this paragraph are
1022 included on all such copies and derivative works. However, this
1023 document itself may not be modified in any way, such as by removing
1024 the copyright notice or references to the Internet Society or other
1025 Internet organizations, except as needed for the purpose of
1026 developing Internet standards in which case the procedures for
1027 copyrights defined in the Internet Standards process must be followed,
1028 or as required to translate it into languages other than English.
1066 Zeilenga LDAPprep [Page 19]