git.sur5r.net Git - openldap/blob - doc/drafts/draft-ietf-ldapbis-strprep-xx.txt

   1
   2
   3
   4
   5
   6
   7 Internet-Draft                                      Kurt D. Zeilenga
   8 Intended Category: Standard Track                OpenLDAP Foundation
   9 Expires in six months                                27 October 2003
  10
  11
  12
  13                 LDAP: Internationalized String Preparation
  14                    <draft-ietf-ldapbis-strprep-02.txt>
  15
  16
  17 Status of this Memo
  18
  19   This document is an Internet-Draft and is in full conformance with all
  20   provisions of Section 10 of RFC2026.
  21
  22   Distribution of this memo is unlimited.  Technical discussion of this
  23   document will take place on the IETF LDAP Revision Working Group
  24   mailing list <ietf-ldapbis@openldap.org>.  Please send editorial
  25   comments directly to the author <Kurt@OpenLDAP.org>.
  26
  27   Internet-Drafts are working documents of the Internet Engineering Task
  28   Force (IETF), its areas, and its working groups.  Note that other
  29   groups may also distribute working documents as Internet-Drafts.
  30   Internet-Drafts are draft documents valid for a maximum of six months
  31   and may be updated, replaced, or obsoleted by other documents at any
  32   time.  It is inappropriate to use Internet-Drafts as reference
  33   material or to cite them other than as ``work in progress.''
  34
  35   The list of current Internet-Drafts can be accessed at
  36   <http://www.ietf.org/ietf/1id-abstracts.txt>. The list of
  37   Internet-Draft Shadow Directories can be accessed at
  38   <http://www.ietf.org/shadow.html>.
  39
  40   Copyright (C) The Internet Society (2003).  All Rights Reserved.
  41
  42   Please see the Full Copyright section near the end of this document
  43   for more information.
  44
  45
  46 Abstract
  47
  48   The previous Lightweight Directory Access Protocol (LDAP) technical
  49   specifications did not precisely define how character string matching
  50   is to be performed.  This lead to a number of usability and
  51   interoperability problems.  This document defines string preparation
  52   algorithms for character-based matching rules defined for use in LDAP.
  53
  54
  55
  56
  57
  58 Zeilenga                        LDAPprep                        [Page 1]
  59 \f
  60 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
  61
  62
  63 Conventions
  64
  65   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  66   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  67   document are to be interpreted as described in BCP 14 [RFC2119].
  68
  69   Character names in this document use the notation for code points and
  70   names from the Unicode Standard [Unicode].  For example, the letter
  71   "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
  72   In the lists of mappings and the prohibited characters, the "U+" is
  73   left off to make the lists easier to read.  The comments for character
  74   ranges are shown in square brackets (such as "[CONTROL CHARACTERS]")
  75   and do not come from the standard.
  76
  77   Note: a glossary of terms used in Unicode can be found in [Glossary].
  78   Information on the Unicode character encoding model can be found in
  79   [CharModel].
  80
  81
  82 1. Introduction
  83
  84 1.1. Background
  85
  86   A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule
  87   [Syntaxes] defines an algorithm for determining whether a presented
  88   value matches an attribute value in accordance with the criteria
  89   defined for the rule.  The proposition may be evaluated to True,
  90   False, or Undefined.
  91
  92       True      - the attribute contains a matching value,
  93
  94       False     - the attribute contains no matching value,
  95
  96       Undefined - it cannot be determined whether the attribute contains
  97                   a matching value or not.
  98
  99   For instance, the caseIgnoreMatch matching rule may be used to compare
 100   whether the commonName attribute contains a particular value without
 101   regard for case and insignificant spaces.
 102
 103
 104 1.2. X.500 String Matching Rules
 105
 106   "X.520: Selected attribute types" [X.520] provides (amongst other
 107   things) value syntaxes and matching rules for comparing values
 108   commonly used in the Directory.  These specifications are inadequate
 109   for strings composed of characters from the Universal Character Set
 110   (UCS) [ISO10646], a superset of Unicode [Unicode].
 111
 112
 113
 114 Zeilenga                        LDAPprep                        [Page 2]
 115 \f
 116 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 117
 118
 119   The caseIgnoreMatch matching rule [X.520], for example, is simply
 120   defined as being a case insensitive comparison where insignificant
 121   spaces are ignored.  For printableString, there is only one space
 122   character and case mapping is bijective, hence this definition is
 123   sufficient.  However, for UCS-based string types such as
 124   universalString, this is not sufficient.  For example, a case
 125   insensitive matching implementation which folded lower case characters
 126   to upper case would yield different different results than an
 127   implementation which used upper case to lower case folding.  Or one
 128   implementation may view space as referring to only SPACE (U+0020), a
 129   second implementation may view any character with the space separator
 130   (Zs) property as a space, and another implementation may view any
 131   character with the whitespace (WS) category as a space.
 132
 133   The lack of precise specification for character string matching has
 134   led to significant interoperability problems.  When used in
 135   certificate chain validation, security vulnerabilities can arise.  To
 136   address these problems, this document defines precise algorithms for
 137   preparing character strings for matching.
 138
 139
 140 1.3. Relationship to "stringprep"
 141
 142   The character string preparation algorithms described in this document
 143   are based upon the "stringprep" approach [StringPrep].  In
 144   "stringprep", presented and stored values are first prepared for
 145   comparison and so that a character-by-character comparison yields the
 146   "correct" result.
 147
 148   The approach used here is a refinement of the "stringprep"
 149   [StringPrep] approach.  Each algorithm involves two additional
 150   preparation steps.
 151
 152   a) prior to applying the Unicode string preparation steps outlined in
 153      "stringprep", the string is transcoded to Unicode;
 154
 155   b) after applying the Unicode string preparation steps outlined in
 156      "stringprep", characters insignificant to the matching rules are
 157      removed.
 158
 159   Hence, preparation of character strings for X.500 matching involves
 160   the following steps:
 161
 162       1) Transcode
 163       2) Map
 164       3) Normalize
 165       4) Prohibit
 166       5) Check Bidi (Bidirectional)
 167
 168
 169
 170 Zeilenga                        LDAPprep                        [Page 3]
 171 \f
 172 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 173
 174
 175       6) Insignificant Character Removal
 176
 177   These steps are described in Section 2.
 178
 179
 180 1.4. Relationship to the LDAP Technical Specification
 181
 182   This document is a integral part of the LDAP technical specification
 183   [Roadmap] which obsoletes the previously defined LDAP technical
 184   specification [RFC3377] in its entirety.
 185
 186   This document details new LDAP internationalized character string
 187   preparation algorithms used by [Syntaxes] and possible other technical
 188   specifications defining LDAP syntaxes and/or matching rules.
 189
 190
 191 1.5. Relationship to X.500
 192
 193   LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism.
 194   As such, there is a strong desire for alignment between LDAP and X.500
 195   syntax and semantics.  The character string preparation algorithms
 196   described in this document are based upon "Internationalized String
 197   Matching Rules for X.500" [XMATCH] proposal to ITU/ISO Joint Study
 198   Group 2.
 199
 200
 201 2. String Preparation
 202
 203   The following six-step process SHALL be applied to each presented and
 204   attribute value in preparation for character string matching rule
 205   evaluation.
 206
 207       1) Transcode
 208       2) Map
 209       3) Normalize
 210       4) Prohibit
 211       5) Check bidi
 212       6) Insignificant Character Removal
 213
 214   Failure in any step causes the assertion to evaluate to Undefined.
 215
 216   This process is intended to act upon non-empty character strings.  If
 217   the string to prepare is empty, this process is not applied and the
 218   assertion is evaluated to Undefined.
 219
 220   The character repertoire of this process is Unicode 3.2 [Unicode].
 221
 222
 223
 224
 225
 226 Zeilenga                        LDAPprep                        [Page 4]
 227 \f
 228 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 229
 230
 231 2.1. Transcode
 232
 233   Each non-Unicode string value is transcoded to Unicode.
 234
 235   TeletexString [X.680][T.61] values are transcoded to Unicode as
 236   described in Appendix A.
 237
 238   PrintableString [X.680] value are transcoded directly to Unicode.
 239
 240   UniversalString, UTF8String, and bmpString [X.680] values need not be
 241   transcoded as they are Unicode-based strings (in the case of
 242   bmpString, a subset of Unicode).
 243
 244   The output is the transcoded string.
 245
 246
 247 2.2. Map
 248
 249   SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
 250   points are mapped to nothing.  COMBINING GRAPHEME JOINER (U+034F) and
 251   VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also
 252   mapped to nothing.  The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
 253   mapped to nothing.
 254
 255   CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
 256   TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
 257   (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
 258
 259   All other control code points (e.g., Cc) or code points with a control
 260   function (e.g., Cf) are mapped to nothing.
 261
 262   ZERO WIDTH SPACE (U+200B) is mapped to nothing.  All other code points
 263   with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or
 264   Zp) are mapped to SPACE (U+0020).
 265
 266   For case ignore, numeric, and stored prefix string matching rules,
 267   characters are case folded per B.2 of [StringPrep].
 268
 269   The output is the mapped string.
 270
 271
 272 2.3. Normalize
 273
 274   The input string is be normalized to Unicode Form KC (compatibility
 275   composed) as described in [UAX15].  The output is the normalized
 276   string.
 277
 278
 279
 280
 281
 282 Zeilenga                        LDAPprep                        [Page 5]
 283 \f
 284 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 285
 286
 287 2.4. Prohibit
 288
 289   All Unassigned code points are prohibited.  Unassigned code points are
 290   listed in Table A.1 of [StringPrep].
 291
 292   Private Use (U+E000-F8FF, F0000-FFFFD, 100000-10FFFD) code points are
 293   prohibited.
 294
 295   All non-character code points (U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF,
 296   2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF,
 297   7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF,
 298   CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, 10FFFE-10FFFF) are
 299   prohibited.
 300
 301   Surrogate codes (U+D800-DFFFF) are prohibited.
 302
 303   The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
 304
 305   The first code point of a string is prohibited from being a combining
 306   character.
 307
 308   The step fails if the input string contains any prohibited code point.
 309   The output is the input string.
 310
 311
 312 2.5. Check bidi
 313
 314   There are no bidirectional restrictions.  The output is the input
 315   string.
 316
 317
 318 2.5. Insignificant Character Removal
 319
 320   In this step, characters insignificant to the matching rule are to be
 321   removed.  The characters to be removed differ from matching rule to
 322   matching rule.
 323
 324   Section 2.5.1 applies to case ignore and exact string matching.
 325   Section 2.5.2 applies to numericString matching.
 326   Section 2.5.3 applies to telephoneNumber matching
 327
 328
 329 2.5.1. Insignificant Space Removal
 330
 331   For the purposes of this section, a space is defined to be the SPACE
 332   (U+0020) code point followed by no combining marks.
 333
 334   NOTE - The previous steps ensure that the string cannot contain any
 335
 336
 337
 338 Zeilenga                        LDAPprep                        [Page 6]
 339 \f
 340 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 341
 342
 343          code points in the separator class, other than SPACE (U+0020).
 344
 345   If the input string consists entirely of spaces or is empty, the
 346   output is a string consisting of exactly one space (e.g. " ").
 347
 348   Otherwise, the following spaces are removed:
 349     - leading spaces (i.e. those preceding the first character that is
 350       not a space);
 351     - trailing spaces (i.e. those following the last character that is
 352       not a space);
 353     - multiple consecutive spaces (these are taken as equivalent to a
 354       single space character).
 355
 356   For example, removal of spaces from the Form KC string:
 357       "<SPACE><SPACE>foo<SPACE><SPACE>bar<SPACE><SPACE>"
 358   would result in the output string:
 359       "foo<SPACE>bar"
 360   and the Form KC string:
 361       "<SPACE><SPACE><SPACE>"
 362   would result in the output string:
 363       "<SPACE>".
 364
 365
 366 2.5.2. numericString Insignificant Character Removal
 367
 368   For the purposes of this section, a space is defined to be the SPACE
 369   (U+0020) code point followed by no combining marks.
 370
 371   All spaces are regarded as not significant.  If the input string
 372   consists entirely of spaces or is empty, the output is a string
 373   consisting of exactly one space (e.g. " ").  Otherwise, all spaces are
 374   to be removed.
 375
 376   For example, removal of spaces from the Form KC string:
 377       "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>"
 378   would result in the output string:
 379       "123456"
 380   and the Form KC string:
 381       "<SPACE><SPACE><SPACE>"
 382   would result in the output string:
 383       "<SPACE>".
 384
 385
 386 2.5.3. telephoneNumber Insignificant Character Removal
 387
 388   For the purposes of this section, a hyphen is defined to be
 389   HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
 390   NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
 391
 392
 393
 394 Zeilenga                        LDAPprep                        [Page 7]
 395 \f
 396 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 397
 398
 399   (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no
 400   combining marks and a space is defined to be the SPACE (U+0020) code
 401   point followed by no combining marks.
 402
 403   All hyphens and spaces are considered insignificant.  If the string
 404   contains only spaces and hyphens or is empty, then the output is a
 405   string consisting of one space.  Otherwise, all hyphens and spaces are
 406   removed.
 407
 408   For example, removal of hyphens and spaces from the Form KC string:
 409       "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>"
 410   would result in the output string:
 411       "123456"
 412   and the Form KC string:
 413       "<HYPHEN><HYPHEN><HYPHEN>"
 414   would result in the output string:
 415       "<SPACE>".
 416
 417
 418 3. Security Considerations
 419
 420   "Preparation for International Strings ('stringprep')" [StringPrep]
 421   security considerations generally apply to the algorithms described
 422   here.
 423
 424
 425 4. Contributors
 426
 427   Appendix A and B of this document were authored by Howard Chu
 428   <hyc@symas.com> of Symas Corporation (based upon information provided
 429   in RFC 1345).
 430
 431
 432 5. Acknowledgments
 433
 434   The approach used in this document is based upon design principles and
 435   algorithms described in "Preparation of Internationalized Strings
 436   ('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet.  Some
 437   additional guidance was drawn from Unicode Technical Standards,
 438   Technical Reports, and Notes.
 439
 440   This document is a product of the IETF LDAP Revision (LDAPBIS) Working
 441   Group.
 442
 443
 444 6. Author's Address
 445
 446   Kurt Zeilenga
 447
 448
 449
 450 Zeilenga                        LDAPprep                        [Page 8]
 451 \f
 452 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 453
 454
 455   E-mail: <kurt@openldap.org>
 456
 457
 458 7. References
 459
 460 7.1. Normative References
 461
 462   [RFC2119]     Bradner, S., "Key words for use in RFCs to Indicate
 463                 Requirement Levels", BCP 14 (also RFC 2119), March 1997.
 464
 465   [Roadmap]     Zeilenga, K. (editor), "LDAP: Technical Specification
 466                 Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in
 467                 progress.
 468
 469   [StringPrep]  Hoffman P. and M. Blanchet, "Preparation of
 470                 Internationalized Strings ('stringprep')",
 471                 draft-hoffman-rfc3454bis-xx.txt, a work in progress.
 472
 473   [Syntaxes]    Legg, S. (editor), "LDAP: Syntaxes and Matching Rules",
 474                 draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress.
 475
 476   [ISO10646]    International Organization for Standardization,
 477                 "Universal Multiple-Octet Coded Character Set (UCS) -
 478                 Architecture and Basic Multilingual Plane", ISO/IEC
 479                 10646-1 : 1993.
 480
 481   [Unicode]     The Unicode Consortium, "The Unicode Standard, Version
 482                 3.2.0" is defined by "The Unicode Standard, Version 3.0"
 483                 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
 484                 as amended by the "Unicode Standard Annex #27: Unicode
 485                 3.1" (http://www.unicode.org/reports/tr27/) and by the
 486                 "Unicode Standard Annex #28: Unicode 3.2"
 487                 (http://www.unicode.org/reports/tr28/).
 488
 489   [UAX15]       Davis, M. and M. Duerst, "Unicode Standard Annex #15:
 490                 Unicode Normalization Forms, Version 3.2.0".
 491                 <http://www.unicode.org/unicode/reports/tr15/tr15-22.html>,
 492                 March 2002.
 493
 494   [X.680]       International Telecommunication Union -
 495                 Telecommunication Standardization Sector, "Abstract
 496                 Syntax Notation One (ASN.1) - Specification of Basic
 497                 Notation", X.680(1997) (also ISO/IEC 8824-1:1998).
 498
 499   [T.61]        CCITT (now ITU), "Character Repertoire and Coded
 500                 Character Sets for the International Teletex Service",
 501                 T.61, 1988.
 502
 503
 504
 505
 506 Zeilenga                        LDAPprep                        [Page 9]
 507 \f
 508 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 509
 510
 511 7.2. Informative References
 512
 513   [X.500]       International Telecommunication Union -
 514                 Telecommunication Standardization Sector, "The Directory
 515                 -- Overview of concepts, models and services,"
 516                 X.500(1993) (also ISO/IEC 9594-1:1994).
 517
 518   [X.501]       International Telecommunication Union -
 519                 Telecommunication Standardization Sector, "The Directory
 520                 -- Models," X.501(1993) (also ISO/IEC 9594-2:1994).
 521
 522   [X.520]       International Telecommunication Union -
 523                 Telecommunication Standardization Sector, "The
 524                 Directory: Selected Attribute Types", X.520(1993) (also
 525                 ISO/IEC 9594-6:1994).
 526
 527   [Glossary]    The Unicode Consortium, "Unicode Glossary",
 528                 <http://www.unicode.org/glossary/>.
 529
 530   [CharModel]   Whistler, K. and M. Davis, "Unicode Technical Report
 531                 #17, Character Encoding Model", UTR17,
 532                 <http://www.unicode.org/unicode/reports/tr17/>, August
 533                 2000.
 534
 535   [XMATCH]      Zeilenga, K., "Internationalized String Matching Rules
 536                 for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt, a
 537                 work in progress.
 538
 539   [RFC1345]     Simonsen, K., "Character Mnemonics & Character Sets",
 540                 RFC 1345, June 1992.
 541
 542
 543 Appendix A. Teletex (T.61) to Unicode
 544
 545   This appendix defines an algorithm for transcoding [T.61] characters
 546   to [Unicode] characters for use in string preparation for LDAP
 547   matching rules.  This appendix is normative.
 548
 549   The transcoding algorithm is derived from the T.61-8bit definition
 550   provided in [RFC1345].  With a few exceptions, the T.61 character
 551   codes from x00 to x7f are equivalent to the corresponding [Unicode]
 552   code points, and their values are left unchanged by this algorithm.
 553   E.g. the T.61 code x20 is identical to (U+0020).  The exceptions are
 554   for these T.61 codes that are undefined: x23, x24, x5c, x5e, x60, x7b,
 555   x7d, and x7e.
 556
 557   The codes from x80 to x9f are also equivalent to the corresponding
 558   Unicode code points.  This is specified for completeness only, as
 559
 560
 561
 562 Zeilenga                        LDAPprep                       [Page 10]
 563 \f
 564 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 565
 566
 567   these codes are control characters, and will be mapped to nothing in
 568   the LDAP String Preparation Mapping step.
 569
 570   The remaining T.61 codes are mapped below in Table A.1.  Table
 571   positions marked "??" are undefined.
 572
 573   Input strings containing undefined T.61 codes SHALL produce an
 574   Undefined matching result. For diagnostic purposes, this algorithm
 575   does not fail for undefined input codes.  Instead, undefined codes in
 576   the input are mapped to the Unicode REPLACEMENT CHARACTER (U+FFFD).
 577   As the LDAP String Preparation Prohibit step disallows the REPLACEMENT
 578   CHARACTER from appearing in its output, this transcoding yields the
 579   desired effect.
 580
 581   Note: RFC 1345 listed the non-spacing accent codepoints as residing in
 582         the range starting at (U+E000).  In the current Unicode
 583         standard, the (U+E000) range is reserved for Private Use, and
 584         the non-spacing accents are in the range starting at (U+0300).
 585         The tables here use the (U+0300) range for these accents.
 586
 587      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 588    --+------+------+------+------+------+------+------+------+
 589    a0| 00a0 | 00a1 | 00a2 | 00a3 | 0024 | 00a5 | 0023 | 00a7 |
 590    a8| 00a8 |  ??  |  ??  | 00ab |  ??  |  ??  |  ??  |  ??  |
 591    b0| 00b0 | 00b1 | 00b2 | 00b3 | 00d7 | 00b5 | 00b6 | 00b7 |
 592    b8| 00f7 |  ??  |  ??  | 00bb | 00bc | 00bd | 00be | 00bf |
 593    c0|  ??  | 0300 | 0301 | 0302 | 0303 | 0304 | 0306 | 0307 |
 594    c8| 0308 |  ??  | 030a | 0327 | 0332 | 030b | 0328 | 030c |
 595    d0|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 596    d8|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 597    e0| 2126 | 00c6 | 00d0 | 00aa |  ??  | 0126 | 0132 | 013f |
 598    e8| 0141 | 00d8 | 0152 | 00ba | 00de | 0166 | 014a | 0149 |
 599    f0| 0138 | 00e6 | 0111 | 00f0 | 0127 | 0131 | 0133 | 0140 |
 600    f8| 0142 | 00f8 | 0153 | 00df | 00fe | 0167 | 014b |  ??  |
 601    --+------+------+------+------+------+------+------+------+
 602             Table A.1:  Mapping of 8-bit T.61 codes to Unicode
 603
 604   T.61 also defines a number of accented characters that are formed by
 605   combining an accent prefix followed by a base character.  These
 606   prefixes are in the code range xc1 to xcf. If a prefix character
 607   appears at the end of a string, the result is undefined.  Otherwise
 608   these sequences are mapped to Unicode by substituting the
 609   corresponding non-spacing accent code (as listed in Table A.1) for the
 610   accent prefix, and exchanging the order so that the base character
 611   precedes the accent.
 612
 613
 614 Appendix B. Additional Teletex (T.61) to Unicode Tables
 615
 616
 617
 618 Zeilenga                        LDAPprep                       [Page 11]
 619 \f
 620 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 621
 622
 623   All of the accented characters in T.61 have a corresponding code point
 624   in Unicode.  For the sake of completeness, the combined character
 625   codes are presented in the following tables.  This is informational
 626   only; for matching purposes it is sufficient to map the non-spacing
 627   accent and exchange the order of the character pair as specified in
 628   Appendix A.   This appendix is informative.
 629
 630
 631 B.1. Combinations with SPACE
 632
 633   Accents may be combined with a <SPACE> to generate the accent by
 634   itself.  For each accent code, the result of combining with <SPACE> is
 635   listed in Table B.1.
 636
 637      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 638    --+------+------+------+------+------+------+------+------+
 639    c0|  ??  | 0060 | 00b4 | 005e | 007e | 00af | 02d8 | 02d9 |
 640    c8| 00a8 |  ??  | 02da | 00b8 |  ??  | 02dd | 02db | 02c7 |
 641    --+------+------+------+------+------+------+------+------+
 642        Table B.1:  Mapping of T.61 Accents with <SPACE> to Unicode
 643
 644
 645 B.2. Combinations for xc1: (Grave accent)
 646
 647   T.61 has predefined characters for combinations with A, E, I, O, and
 648   U.  Unicode also defines combinations for N, W, and Y.  All of these
 649   combinations are present in Table B.2.
 650
 651      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 652    --+------+------+------+------+------+------+------+------+
 653    40|  ??  | 00c0 |  ??  |  ??  |  ??  | 00c8 |  ??  |  ??  |
 654    48|  ??  | 00cc |  ??  |  ??  |  ??  |  ??  | 01f8 | 00d2 |
 655    50|  ??  |  ??  |  ??  |  ??  |  ??  | 00d9 |  ??  | 1e80 |
 656    58|  ??  | 1ef2 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 657    60|  ??  | 00e0 |  ??  |  ??  |  ??  | 00e8 |  ??  |  ??  |
 658    68|  ??  | 00ec |  ??  |  ??  |  ??  |  ??  | 01f9 | 00f2 |
 659    70|  ??  |  ??  |  ??  |  ??  |  ??  | 00f9 |  ??  | 1e81 |
 660    78|  ??  | 1ef3 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 661    --+------+------+------+------+------+------+------+------+
 662            Table B.2: Mapping of T.61 Grave Accent Combinations
 663
 664
 665 B.3. Combinations for xc2: (Acute accent)
 666
 667   T.61 has predefined characters for combinations with A, E, I, O, U, Y,
 668   C, L, N, R, S, and Z.  Unicode also defines G, K, M, P, and W.  All of
 669   these combinations are present in Table B.3.
 670
 671
 672
 673
 674 Zeilenga                        LDAPprep                       [Page 12]
 675 \f
 676 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 677
 678
 679      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 680    --+------+------+------+------+------+------+------+------+
 681    40|  ??  | 00c1 |  ??  | 0106 |  ??  | 00c9 |  ??  | 01f4 |
 682    48|  ??  | 00cd |  ??  | 1e30 | 0139 | 1e3e | 0143 | 00d3 |
 683    50| 1e54 |  ??  | 0154 | 015a |  ??  | 00da |  ??  | 1e82 |
 684    58|  ??  | 00dd | 0179 |  ??  |  ??  |  ??  |  ??  |  ??  |
 685    60|  ??  | 00e1 |  ??  | 0107 |  ??  | 00e9 |  ??  | 01f5 |
 686    68|  ??  | 00ed |  ??  | 1e31 | 013a | 1e3f | 0144 | 00f3 |
 687    70| 1e55 |  ??  | 0155 | 015b |  ??  | 00fa |  ??  | 1e83 |
 688    78|  ??  | 00fd | 017a |  ??  |  ??  |  ??  |  ??  |  ??  |
 689    --+------+------+------+------+------+------+------+------+
 690            Table B.3: Mapping of T.61 Acute Accent Combinations
 691
 692
 693 B.4. Combinations for xc3: (Circumflex)
 694
 695   T.61 has predefined characters for combinations with A, E, I, O, U, Y,
 696   C, G, H, J, S, and W.  Unicode also defines the combination for Z.
 697   All of these combinations are present in Table B.4.
 698
 699      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 700    --+------+------+------+------+------+------+------+------+
 701    40|  ??  | 00c2 |  ??  | 0108 |  ??  | 00ca |  ??  | 011c |
 702    48| 0124 | 00ce | 0134 |  ??  |  ??  |  ??  |  ??  | 00d4 |
 703    50|  ??  |  ??  |  ??  | 015c |  ??  | 00db |  ??  | 0174 |
 704    58|  ??  | 0176 | 1e90 |  ??  |  ??  |  ??  |  ??  |  ??  |
 705    60|  ??  | 00e2 |  ??  | 0109 |  ??  | 00ea |  ??  | 011d |
 706    68| 0125 | 00ee | 0135 |  ??  |  ??  |  ??  |  ??  | 00f4 |
 707    70|  ??  |  ??  |  ??  | 015d |  ??  | 00fb |  ??  | 0175 |
 708    78|  ??  | 0177 | 1e91 |  ??  |  ??  |  ??  |  ??  |  ??  |
 709    --+------+------+------+------+------+------+------+------+
 710         Table B.4: Mapping of T.61 Circumflex Accent Combinations
 711
 712
 713 B.5. Combinations for xc4: (Tilde)
 714
 715   T.61 has predefined characters for combinations with A, I, O, U, and
 716   N.  Unicode also defines E, V, and Y.  All of these combinations are
 717   present in Table B.5.
 718
 719      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 720    --+------+------+------+------+------+------+------+------+
 721    40|  ??  | 00c3 |  ??  |  ??  |  ??  | 1ebc |  ??  |  ??  |
 722    48|  ??  | 0128 |  ??  |  ??  |  ??  |  ??  | 00d1 | 00d5 |
 723    50|  ??  |  ??  |  ??  |  ??  |  ??  | 0168 | 1e7c |  ??  |
 724    58|  ??  | 1ef8 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 725    60|  ??  | 00e3 |  ??  |  ??  |  ??  | 1ebd |  ??  |  ??  |
 726    68|  ??  | 0129 |  ??  |  ??  |  ??  |  ??  | 00f1 | 00f5 |
 727
 728
 729
 730 Zeilenga                        LDAPprep                       [Page 13]
 731 \f
 732 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 733
 734
 735    70|  ??  |  ??  |  ??  |  ??  |  ??  | 0169 | 1e7d |  ??  |
 736    78|  ??  | 1ef9 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 737    --+------+------+------+------+------+------+------+------+
 738            Table B.5: Mapping of T.61 Tilde Accent Combinations
 739
 740
 741 B.6. Combinations for xc5: (Macron)
 742
 743   T.61 has predefined characters for combinations with A, E, I, O, and
 744   U.  Unicode also defines Y, G, and AE.  All of these combinations are
 745   present in Table B.6.
 746
 747      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 748    --+------+------+------+------+------+------+------+------+
 749    40|  ??  | 0100 |  ??  |  ??  |  ??  | 0112 |  ??  | 1e20 |
 750    48|  ??  | 012a |  ??  |  ??  |  ??  |  ??  |  ??  | 014c |
 751    50|  ??  |  ??  |  ??  |  ??  |  ??  | 016a |  ??  |  ??  |
 752    58|  ??  | 0232 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 753    60|  ??  | 0101 |  ??  |  ??  |  ??  | 0113 |  ??  | 1e21 |
 754    68|  ??  | 012b |  ??  |  ??  |  ??  |  ??  |  ??  | 014d |
 755    70|  ??  |  ??  |  ??  |  ??  |  ??  | 016b |  ??  |  ??  |
 756    78|  ??  | 0233 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 757    e0|  ??  | 01e2 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 758    f0|  ??  | 01e3 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 759    --+------+------+------+------+------+------+------+------+
 760           Table B.6: Mapping of T.61 Macron Accent Combinations
 761
 762
 763 B.7. Combinations for xc6: (Breve)
 764
 765   T.61 has predefined characters for combinations with A, U, and G.
 766   Unicode also defines E, I, and O.  All of these combinations are
 767   present in Table B.7.
 768
 769      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 770    --+------+------+------+------+------+------+------+------+
 771    40|  ??  | 0102 |  ??  |  ??  |  ??  | 0114 |  ??  | 011e |
 772    48|  ??  | 012c |  ??  |  ??  |  ??  |  ??  |  ??  | 014e |
 773    50|  ??  |  ??  |  ??  |  ??  |  ??  | 016c |  ??  |  ??  |
 774    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 775    60|  ??  | 0103 |  ??  |  ??  |  ??  | 0115 |  ??  | 011f |
 776    68|  ??  | 012d |  ??  |  ??  |  ??  |  ??  | 00f1 | 014f |
 777    70|  ??  |  ??  |  ??  |  ??  |  ??  | 016d |  ??  |  ??  |
 778    78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 779    --+------+------+------+------+------+------+------+------+
 780            Table B.7: Mapping of T.61 Breve Accent Combinations
 781
 782
 783
 784
 785
 786 Zeilenga                        LDAPprep                       [Page 14]
 787 \f
 788 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 789
 790
 791 B.8. Combinations for xc7: (Dot Above)
 792
 793   T.61 has predefined characters for C, E, G, I, and Z.  Unicode also
 794   defines A, O, B, D, F, H, M, N, P, R, S, T, W, X, and Y.  All of these
 795   combinations are present in Table B.8.
 796
 797      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 798    --+------+------+------+------+------+------+------+------+
 799    40|  ??  | 0226 | 1e02 | 010a | 1e0a | 0116 | 1e1e | 0120 |
 800    48| 1e22 | 0130 |  ??  |  ??  |  ??  | 1e40 | 1e44 | 022e |
 801    50| 1e56 |  ??  | 1e58 | 1e60 | 1e6a |  ??  |  ??  | 1e86 |
 802    58| 1e8a | 1e8e | 017b |  ??  |  ??  |  ??  |  ??  |  ??  |
 803    60|  ??  | 0227 | 1e03 | 010b | 1e0b | 0117 | 1e1f | 0121 |
 804    68| 1e23 |  ??  |  ??  |  ??  |  ??  | 1e41 | 1e45 | 022f |
 805    70| 1e57 |  ??  | 1e59 | 1e61 | 1e6b |  ??  |  ??  | 1e87 |
 806    78| 1e8b | 1e8f | 017c |  ??  |  ??  |  ??  |  ??  |  ??  |
 807    --+------+------+------+------+------+------+------+------+
 808          Table B.8: Mapping of T.61 Dot Above Accent Combinations
 809
 810
 811 B.9. Combinations for xc8: (Diaeresis)
 812
 813   T.61 has predefined characters for A, E, I, O, U, and Y.  Unicode also
 814   defines H, W, X, and t.  All of these combinations are present in
 815   Table B.9.
 816
 817      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 818    --+------+------+------+------+------+------+------+------+
 819    40|  ??  | 00c4 |  ??  |  ??  |  ??  | 00cb |  ??  |  ??  |
 820    48| 1e26 | 00cf |  ??  |  ??  |  ??  |  ??  |  ??  | 00d6 |
 821    50|  ??  |  ??  |  ??  |  ??  |  ??  | 00dc |  ??  | 1e84 |
 822    58| 1e8c | 0178 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 823    60|  ??  | 00e4 |  ??  |  ??  |  ??  | 00eb |  ??  |  ??  |
 824    68| 1e27 | 00ef |  ??  |  ??  |  ??  |  ??  |  ??  | 00f6 |
 825    70|  ??  |  ??  |  ??  |  ??  | 1e97 | 00fc |  ??  | 1e85 |
 826    78| 1e8d | 00ff |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 827    --+------+------+------+------+------+------+------+------+
 828          Table B.8: Mapping of T.61 Diaeresis Accent Combinations
 829
 830
 831 B.10. Combinations for xca: (Ring Above)
 832
 833   T.61 has predefined characters for A, and U.  Unicode also defines w
 834   and y.  All of these combinations are present in Table B.10.
 835
 836      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 837    --+------+------+------+------+------+------+------+------+
 838    40|  ??  | 00c5 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 839
 840
 841
 842 Zeilenga                        LDAPprep                       [Page 15]
 843 \f
 844 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 845
 846
 847    48|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 848    50|  ??  |  ??  |  ??  |  ??  |  ??  | 016e |  ??  |  ??  |
 849    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 850    60|  ??  | 00e5 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 851    68|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 852    70|  ??  |  ??  |  ??  |  ??  |  ??  | 016f |  ??  | 1e98 |
 853    78|  ??  | 1e99 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 854    --+------+------+------+------+------+------+------+------+
 855         Table B.10: Mapping of T.61 Ring Above Accent Combinations
 856
 857
 858 B.11. Combinations for xcb: (Cedilla)
 859
 860   T.61 has predefined characters for C, G, K, L, N, R, S, and T.
 861   Unicode also defines E, D, and H.  All of these combinations are
 862   present in Table B.11.
 863
 864      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 865    --+------+------+------+------+------+------+------+------+
 866    40|  ??  |  ??  |  ??  | 00c7 | 1e10 | 0228 |  ??  | 0122 |
 867    48| 1e28 |  ??  |  ??  | 0136 | 013b |  ??  | 0145 |  ??  |
 868    50|  ??  |  ??  | 0156 | 015e | 0162 |  ??  |  ??  |  ??  |
 869    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 870    60|  ??  |  ??  |  ??  | 00e7 | 1e11 | 0229 |  ??  | 0123 |
 871    68| 1e29 |  ??  |  ??  | 0137 | 013c |  ??  | 0146 |  ??  |
 872    70|  ??  |  ??  | 0157 | 015f | 0163 |  ??  |  ??  |  ??  |
 873    78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 874    --+------+------+------+------+------+------+------+------+
 875          Table B.11: Mapping of T.61 Cedilla Accent Combinations
 876
 877
 878 B.12. Combinations for xcd: (Double Acute Accent)
 879
 880   T.61 has predefined characters for O, and U.  These combinations are
 881   present in Table B.12.
 882
 883      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 884    --+------+------+------+------+------+------+------+------+
 885    48|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  | 0150 |
 886    50|  ??  |  ??  |  ??  |  ??  |  ??  | 0170 |  ??  |  ??  |
 887    68|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  | 0151 |
 888    70|  ??  |  ??  |  ??  |  ??  |  ??  | 0171 |  ??  |  ??  |
 889    --+------+------+------+------+------+------+------+------+
 890        Table B.12: Mapping of T.61 Double Acute Accent Combinations
 891
 892
 893 B.13. Combinations for xce: (Ogonek)
 894
 895
 896
 897
 898 Zeilenga                        LDAPprep                       [Page 16]
 899 \f
 900 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 901
 902
 903   T.61 has predefined characters for A, E, I, and U.  Unicode also
 904   defines the combination for O.  All of these combinations are present
 905   in Table B.13.
 906
 907      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 908    --+------+------+------+------+------+------+------+------+
 909    40|  ??  | 0104 |  ??  |  ??  |  ??  | 0118 |  ??  |  ??  |
 910    48|  ??  | 012e |  ??  |  ??  |  ??  |  ??  |  ??  | 01ea |
 911    50|  ??  |  ??  |  ??  |  ??  |  ??  | 0172 |  ??  |  ??  |
 912    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 913    60|  ??  | 0105 |  ??  |  ??  |  ??  | 0119 |  ??  |  ??  |
 914    68|  ??  | 012f |  ??  |  ??  |  ??  |  ??  |  ??  | 01eb |
 915    70|  ??  |  ??  |  ??  |  ??  |  ??  | 0173 |  ??  |  ??  |
 916    78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 917    --+------+------+------+------+------+------+------+------+
 918           Table B.13: Mapping of T.61 Ogonek Accent Combinations
 919
 920
 921 B.14. Combinations for xcf: (Caron)
 922
 923   T.61 has predefined characters for C, D, E, L, N, R, S, T, and Z.
 924   Unicode also defines A, I, O, U, G, H, j,and K.  All of these
 925   combinations are present in Table B.14.
 926
 927      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 928    --+------+------+------+------+------+------+------+------+
 929    40|  ??  | 01cd |  ??  | 010c | 010e | 011a |  ??  | 01e6 |
 930    48| 021e | 01cf |  ??  | 01e8 | 013d |  ??  | 0147 | 01d1 |
 931    50|  ??  |  ??  | 0158 | 0160 | 0164 | 01d3 |  ??  |  ??  |
 932    58|  ??  |  ??  | 017d |  ??  |  ??  |  ??  |  ??  |  ??  |
 933    60|  ??  | 01ce |  ??  | 010d | 010f | 011b |  ??  | 01e7 |
 934    68| 021f | 01d0 | 01f0 | 01e9 | 013e |  ??  | 0148 | 01d2 |
 935    70|  ??  |  ??  | 0159 | 0161 | 0165 | 01d4 |  ??  |  ??  |
 936    78|  ??  |  ??  | 017e |  ??  |  ??  |  ??  |  ??  |  ??  |
 937    --+------+------+------+------+------+------+------+------+
 938           Table B.14: Mapping of T.61 Caron Accent Combinations
 939
 940
 941
 942 Intellectual Property Rights
 943
 944   The IETF takes no position regarding the validity or scope of any
 945   intellectual property or other rights that might be claimed to pertain
 946   to the implementation or use of the technology described in this
 947   document or the extent to which any license under such rights might or
 948   might not be available; neither does it represent that it has made any
 949   effort to identify any such rights.  Information on the IETF's
 950   procedures with respect to rights in standards-track and
 951
 952
 953
 954 Zeilenga                        LDAPprep                       [Page 17]
 955 \f
 956 Internet-Draft        draft-ietf-ldapbis-strprep-02      27 October 2003
 957
 958
 959   standards-related documentation can be found in BCP-11.  Copies of
 960   claims of rights made available for publication and any assurances of
 961   licenses to be made available, or the result of an attempt made to
 962   obtain a general license or permission for the use of such proprietary
 963   rights by implementors or users of this specification can be obtained
 964   from the IETF Secretariat.
 965
 966   The IETF invites any interested party to bring to its attention any
 967   copyrights, patents or patent applications, or other proprietary
 968   rights which may cover technology that may be required to practice
 969   this standard.  Please address the information to the IETF Executive
 970   Director.
 971
 972
 973
 974 Full Copyright
 975
 976   Copyright (C) The Internet Society (2003). All Rights Reserved.
 977
 978   This document and translations of it may be copied and furnished to
 979   others, and derivative works that comment on or otherwise explain it
 980   or assist in its implmentation may be prepared, copied, published and
 981   distributed, in whole or in part, without restriction of any kind,
 982   provided that the above copyright notice and this paragraph are
 983   included on all such copies and derivative works.  However, this
 984   document itself may not be modified in any way, such as by removing
 985   the copyright notice or references to the Internet Society or other
 986   Internet organizations, except as needed for the  purpose of
 987   developing Internet standards in which case the procedures for
 988   copyrights defined in the Internet Standards process must be followed,
 989   or as required to translate it into languages other than English.
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010 Zeilenga                        LDAPprep                       [Page 18]
1011 \f