git.sur5r.net Git - openldap/blob - doc/drafts/draft-ietf-ldapbis-strprep-xx.txt

   1
   2
   3
   4
   5
   6
   7 Internet-Draft                                      Kurt D. Zeilenga
   8 Intended Category: Standard Track                OpenLDAP Foundation
   9 Expires in six months                               15 February 2004
  10
  11
  12
  13                 LDAP: Internationalized String Preparation
  14                    <draft-ietf-ldapbis-strprep-03.txt>
  15
  16
  17 Status of this Memo
  18
  19   This document is an Internet-Draft and is in full conformance with all
  20   provisions of Section 10 of RFC2026.
  21
  22   Distribution of this memo is unlimited.  Technical discussion of this
  23   document will take place on the IETF LDAP Revision Working Group
  24   mailing list <ietf-ldapbis@openldap.org>.  Please send editorial
  25   comments directly to the author <Kurt@OpenLDAP.org>.
  26
  27   Internet-Drafts are working documents of the Internet Engineering Task
  28   Force (IETF), its areas, and its working groups.  Note that other
  29   groups may also distribute working documents as Internet-Drafts.
  30   Internet-Drafts are draft documents valid for a maximum of six months
  31   and may be updated, replaced, or obsoleted by other documents at any
  32   time.  It is inappropriate to use Internet-Drafts as reference
  33   material or to cite them other than as ``work in progress.''
  34
  35   The list of current Internet-Drafts can be accessed at
  36   <http://www.ietf.org/ietf/1id-abstracts.txt>. The list of
  37   Internet-Draft Shadow Directories can be accessed at
  38   <http://www.ietf.org/shadow.html>.
  39
  40   Copyright (C) The Internet Society (2004).  All Rights Reserved.
  41
  42   Please see the Full Copyright section near the end of this document
  43   for more information.
  44
  45
  46 Abstract
  47
  48   The previous Lightweight Directory Access Protocol (LDAP) technical
  49   specifications did not precisely define how character string matching
  50   is to be performed.  This led to a number of usability and
  51   interoperability problems.  This document defines string preparation
  52   algorithms for character-based matching rules defined for use in LDAP.
  53
  54
  55
  56
  57
  58 Zeilenga                        LDAPprep                        [Page 1]
  59 \f
  60 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
  61
  62
  63 Conventions
  64
  65   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  66   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  67   document are to be interpreted as described in BCP 14 [RFC2119].
  68
  69   Character names in this document use the notation for code points and
  70   names from the Unicode Standard [Unicode].  For example, the letter
  71   "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
  72   In the lists of mappings and the prohibited characters, the "U+" is
  73   left off to make the lists easier to read.  The comments for character
  74   ranges are shown in square brackets (such as "[CONTROL CHARACTERS]")
  75   and do not come from the standard.
  76
  77   Note: a glossary of terms used in Unicode can be found in [Glossary].
  78   Information on the Unicode character encoding model can be found in
  79   [CharModel].
  80
  81
  82 1. Introduction
  83
  84 1.1. Background
  85
  86   A Lightweight Directory Access Protocol (LDAP) [Roadmap] matching rule
  87   [Syntaxes] defines an algorithm for determining whether a presented
  88   value matches an attribute value in accordance with the criteria
  89   defined for the rule.  The proposition may be evaluated to True,
  90   False, or Undefined.
  91
  92       True      - the attribute contains a matching value,
  93
  94       False     - the attribute contains no matching value,
  95
  96       Undefined - it cannot be determined whether the attribute contains
  97                   a matching value or not.
  98
  99   For instance, the caseIgnoreMatch matching rule may be used to compare
 100   whether the commonName attribute contains a particular value without
 101   regard for case and insignificant spaces.
 102
 103
 104 1.2. X.500 String Matching Rules
 105
 106   "X.520: Selected attribute types" [X.520] provides (amongst other
 107   things) value syntaxes and matching rules for comparing values
 108   commonly used in the Directory.  These specifications are inadequate
 109   for strings composed of Unicode [Unicode] characters.
 110
 111
 112
 113
 114 Zeilenga                        LDAPprep                        [Page 2]
 115 \f
 116 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 117
 118
 119   The caseIgnoreMatch matching rule [X.520], for example, is simply
 120   defined as being a case insensitive comparison where insignificant
 121   spaces are ignored.  For printableString, there is only one space
 122   character and case mapping is bijective, hence this definition is
 123   sufficient.  However, for Unicode string types such as
 124   universalString, this is not sufficient.  For example, a case
 125   insensitive matching implementation which folded lower case characters
 126   to upper case would yield different different results than an
 127   implementation which used upper case to lower case folding.  Or one
 128   implementation may view space as referring to only SPACE (U+0020), a
 129   second implementation may view any character with the space separator
 130   (Zs) property as a space, and another implementation may view any
 131   character with the whitespace (WS) category as a space.
 132
 133   The lack of precise specification for character string matching has
 134   led to significant interoperability problems.  When used in
 135   certificate chain validation, security vulnerabilities can arise.  To
 136   address these problems, this document defines precise algorithms for
 137   preparing character strings for matching.
 138
 139
 140 1.3. Relationship to "stringprep"
 141
 142   The character string preparation algorithms described in this document
 143   are based upon the "stringprep" approach [StringPrep].  In
 144   "stringprep", presented and stored values are first prepared for
 145   comparison and so that a character-by-character comparison yields the
 146   "correct" result.
 147
 148   The approach used here is a refinement of the "stringprep"
 149   [StringPrep] approach.  Each algorithm involves two additional
 150   preparation steps.
 151
 152   a) prior to applying the Unicode string preparation steps outlined in
 153      "stringprep", the string is transcoded to Unicode;
 154
 155   b) after applying the Unicode string preparation steps outlined in
 156      "stringprep", characters insignificant to the matching rules are
 157      removed.
 158
 159   Hence, preparation of character strings for X.500 matching involves
 160   the following steps:
 161
 162       1) Transcode
 163       2) Map
 164       3) Normalize
 165       4) Prohibit
 166       5) Check Bidi (Bidirectional)
 167
 168
 169
 170 Zeilenga                        LDAPprep                        [Page 3]
 171 \f
 172 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 173
 174
 175       6) Insignificant Character Removal
 176
 177   These steps are described in Section 2.
 178
 179
 180 1.4. Relationship to the LDAP Technical Specification
 181
 182   This document is a integral part of the LDAP technical specification
 183   [Roadmap] which obsoletes the previously defined LDAP technical
 184   specification [RFC3377] in its entirety.
 185
 186   This document details new LDAP internationalized character string
 187   preparation algorithms used by [Syntaxes] and possible other technical
 188   specifications defining LDAP syntaxes and/or matching rules.
 189
 190
 191 1.5. Relationship to X.500
 192
 193   LDAP is defined [Roadmap] in X.500 terms as an X.500 access mechanism.
 194   As such, there is a strong desire for alignment between LDAP and X.500
 195   syntax and semantics.  The character string preparation algorithms
 196   described in this document are based upon "Internationalized String
 197   Matching Rules for X.500" [XMATCH] proposal to ITU/ISO Joint Study
 198   Group 2.
 199
 200
 201 2. String Preparation
 202
 203   The following six-step process SHALL be applied to each presented and
 204   attribute value in preparation for character string matching rule
 205   evaluation.
 206
 207       1) Transcode
 208       2) Map
 209       3) Normalize
 210       4) Prohibit
 211       5) Check bidi
 212       6) Insignificant Character Removal
 213
 214   Failure in any step causes the assertion to evaluate to Undefined.
 215
 216   This process is intended to act upon non-empty character strings.  If
 217   the string to prepare is empty, this process is not applied and the
 218   assertion is evaluated to Undefined.
 219
 220   The character repertoire of this process is Unicode 3.2 [Unicode].
 221
 222
 223
 224
 225
 226 Zeilenga                        LDAPprep                        [Page 4]
 227 \f
 228 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 229
 230
 231 2.1. Transcode
 232
 233   Each non-Unicode string value is transcoded to Unicode.
 234
 235   TeletexString [X.680][T.61] values are transcoded to Unicode as
 236   described in Appendix A.
 237
 238   PrintableString [X.680] value are transcoded directly to Unicode.
 239
 240   UniversalString, UTF8String, and bmpString [X.680] values need not be
 241   transcoded as they are Unicode-based strings (in the case of
 242   bmpString, a subset of Unicode).
 243
 244   The output is the transcoded string.
 245
 246
 247 2.2. Map
 248
 249   SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
 250   points are mapped to nothing.  COMBINING GRAPHEME JOINER (U+034F) and
 251   VARIATION SELECTORs (U+180B-180D,FF00-FE0F) code points are also
 252   mapped to nothing.  The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
 253   mapped to nothing.
 254
 255   CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
 256   TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
 257   (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
 258
 259   All other control code points (e.g., Cc) or code points with a control
 260   function (e.g., Cf) are mapped to nothing.
 261
 262   ZERO WIDTH SPACE (U+200B) is mapped to nothing.  All other code points
 263   with Separator (space, line, or paragraph) property (e.g, Zs, Zl, or
 264   Zp) are mapped to SPACE (U+0020).
 265
 266   Appendix B provides a table detailing the above mappings.
 267
 268   For case ignore, numeric, and stored prefix string matching rules,
 269   characters are case folded per B.2 of [StringPrep].
 270
 271   The output is the mapped string.
 272
 273
 274 2.3. Normalize
 275
 276   The input string is be normalized to Unicode Form KC (compatibility
 277   composed) as described in [UAX15].  The output is the normalized
 278   string.
 279
 280
 281
 282 Zeilenga                        LDAPprep                        [Page 5]
 283 \f
 284 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 285
 286
 287 2.4. Prohibit
 288
 289   All Unassigned code points are prohibited.  Unassigned code points are
 290   listed in Table A.1 of [StringPrep].
 291
 292   Characters which, per Section 5.8 of [Stringprep], change display
 293   properties or are deprecated are prohibited.  These characters are are
 294   listed in Table C.8 of [StringPrep].
 295
 296   Private Use (U+E000-F8FF, F0000-FFFFD, 100000-10FFFD) code points are
 297   prohibited.
 298
 299   All non-character code points (U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF,
 300   2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF,
 301   7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF,
 302   CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, 10FFFE-10FFFF) are
 303   prohibited.
 304
 305   Surrogate codes (U+D800-DFFFF) are prohibited.
 306
 307   The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
 308
 309   The step fails if the input string contains any prohibited code point.
 310   Otherwise, the output is the input string.
 311
 312
 313 2.5. Check bidi
 314
 315   This step fails if the input string does not conform to the the
 316   bidirectional character restrictions detailed in 6 of [Stringprep].
 317   Otherwise, the output is the input string.
 318
 319
 320 2.6. Insignificant Character Removal
 321
 322   In this step, characters insignificant to the matching rule are to be
 323   removed.  The characters to be removed differ from matching rule to
 324   matching rule.
 325
 326   Section 2.6.1 applies to case ignore and exact string matching.
 327   Section 2.6.2 applies to numericString matching.
 328   Section 2.6.3 applies to telephoneNumber matching.
 329
 330
 331 2.6.1. Insignificant Space Removal
 332
 333   For the purposes of this section, a space is defined to be the SPACE
 334   (U+0020) code point followed by no combining marks.
 335
 336
 337
 338 Zeilenga                        LDAPprep                        [Page 6]
 339 \f
 340 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 341
 342
 343   NOTE - The previous steps ensure that the string cannot contain any
 344          code points in the separator class, other than SPACE (U+0020).
 345
 346   If the input string consists entirely of spaces or is empty, the
 347   output is a string consisting of exactly one space (e.g. " ").
 348
 349   Otherwise, the following spaces are removed:
 350     - leading spaces (i.e. those preceding the first character that is
 351       not a space);
 352     - trailing spaces (i.e. those following the last character that is
 353       not a space);
 354     - multiple consecutive spaces (these are taken as equivalent to a
 355       single space character).
 356
 357   For example, removal of spaces from the Form KC string:
 358       "<SPACE><SPACE>foo<SPACE><SPACE>bar<SPACE><SPACE>"
 359   would result in the output string:
 360       "foo<SPACE>bar"
 361   and the Form KC string:
 362       "<SPACE><SPACE><SPACE>"
 363   would result in the output string:
 364       "<SPACE>".
 365
 366
 367 2.6.2. numericString Insignificant Character Removal
 368
 369   For the purposes of this section, a space is defined to be the SPACE
 370   (U+0020) code point followed by no combining marks.
 371
 372   All spaces are regarded as not significant.  If the input string
 373   consists entirely of spaces or is empty, the output is a string
 374   consisting of exactly one space (e.g. " ").  Otherwise, all spaces are
 375   to be removed.
 376
 377   For example, removal of spaces from the Form KC string:
 378       "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>"
 379   would result in the output string:
 380       "123456"
 381   and the Form KC string:
 382       "<SPACE><SPACE><SPACE>"
 383   would result in the output string:
 384       "<SPACE>".
 385
 386
 387 2.6.3. telephoneNumber Insignificant Character Removal
 388
 389   For the purposes of this section, a hyphen is defined to be
 390   HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
 391
 392
 393
 394 Zeilenga                        LDAPprep                        [Page 7]
 395 \f
 396 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 397
 398
 399   NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
 400   (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by no
 401   combining marks and a space is defined to be the SPACE (U+0020) code
 402   point followed by no combining marks.
 403
 404   All hyphens and spaces are considered insignificant.  If the string
 405   contains only spaces and hyphens or is empty, then the output is a
 406   string consisting of one space.  Otherwise, all hyphens and spaces are
 407   removed.
 408
 409   For example, removal of hyphens and spaces from the Form KC string:
 410       "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>"
 411   would result in the output string:
 412       "123456"
 413   and the Form KC string:
 414       "<HYPHEN><HYPHEN><HYPHEN>"
 415   would result in the output string:
 416       "<SPACE>".
 417
 418
 419 3. Security Considerations
 420
 421   "Preparation for International Strings ('stringprep')" [StringPrep]
 422   security considerations generally apply to the algorithms described
 423   here.
 424
 425
 426 4. Contributors
 427
 428   Appendix A and B of this document were authored by Howard Chu
 429   <hyc@symas.com> of Symas Corporation (based upon information provided
 430   in RFC 1345).
 431
 432
 433 5. Acknowledgments
 434
 435   The approach used in this document is based upon design principles and
 436   algorithms described in "Preparation of Internationalized Strings
 437   ('stringprep')" [StringPrep] by Paul Hoffman and Marc Blanchet.  Some
 438   additional guidance was drawn from Unicode Technical Standards,
 439   Technical Reports, and Notes.
 440
 441   This document is a product of the IETF LDAP Revision (LDAPBIS) Working
 442   Group.
 443
 444
 445 6. Author's Address
 446
 447
 448
 449
 450 Zeilenga                        LDAPprep                        [Page 8]
 451 \f
 452 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 453
 454
 455   Kurt D. Zeilenga
 456   OpenLDAP Foundation
 457
 458   Email: Kurt@OpenLDAP.org
 459
 460
 461 7. References
 462
 463 7.1. Normative References
 464
 465   [RFC2119]     Bradner, S., "Key words for use in RFCs to Indicate
 466                 Requirement Levels", BCP 14 (also RFC 2119), March 1997.
 467
 468   [Roadmap]     Zeilenga, K. (editor), "LDAP: Technical Specification
 469                 Road Map", draft-ietf-ldapbis-roadmap-xx.txt, a work in
 470                 progress.
 471
 472   [StringPrep]  Hoffman P. and M. Blanchet, "Preparation of
 473                 Internationalized Strings ('stringprep')",
 474                 draft-hoffman-rfc3454bis-xx.txt, a work in progress.
 475
 476   [Syntaxes]    Legg, S. (editor), "LDAP: Syntaxes and Matching Rules",
 477                 draft-ietf-ldapbis-syntaxes-xx.txt, a work in progress.
 478
 479   [Unicode]     The Unicode Consortium, "The Unicode Standard, Version
 480                 3.2.0" is defined by "The Unicode Standard, Version 3.0"
 481                 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5),
 482                 as amended by the "Unicode Standard Annex #27: Unicode
 483                 3.1" (http://www.unicode.org/reports/tr27/) and by the
 484                 "Unicode Standard Annex #28: Unicode 3.2"
 485                 (http://www.unicode.org/reports/tr28/).
 486
 487   [UAX15]       Davis, M. and M. Duerst, "Unicode Standard Annex #15:
 488                 Unicode Normalization Forms, Version 3.2.0".
 489                 <http://www.unicode.org/unicode/reports/tr15/tr15-22.html>,
 490                 March 2002.
 491
 492   [X.680]       International Telecommunication Union -
 493                 Telecommunication Standardization Sector, "Abstract
 494                 Syntax Notation One (ASN.1) - Specification of Basic
 495                 Notation", X.680(1997) (also ISO/IEC 8824-1:1998).
 496
 497   [T.61]        CCITT (now ITU), "Character Repertoire and Coded
 498                 Character Sets for the International Teletex Service",
 499                 T.61, 1988.
 500
 501 7.2. Informative References
 502
 503
 504
 505
 506 Zeilenga                        LDAPprep                        [Page 9]
 507 \f
 508 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 509
 510
 511   [X.500]       International Telecommunication Union -
 512                 Telecommunication Standardization Sector, "The Directory
 513                 -- Overview of concepts, models and services,"
 514                 X.500(1993) (also ISO/IEC 9594-1:1994).
 515
 516   [X.501]       International Telecommunication Union -
 517                 Telecommunication Standardization Sector, "The Directory
 518                 -- Models," X.501(1993) (also ISO/IEC 9594-2:1994).
 519
 520   [X.520]       International Telecommunication Union -
 521                 Telecommunication Standardization Sector, "The
 522                 Directory: Selected Attribute Types", X.520(1993) (also
 523                 ISO/IEC 9594-6:1994).
 524
 525   [Glossary]    The Unicode Consortium, "Unicode Glossary",
 526                 <http://www.unicode.org/glossary/>.
 527
 528   [CharModel]   Whistler, K. and M. Davis, "Unicode Technical Report
 529                 #17, Character Encoding Model", UTR17,
 530                 <http://www.unicode.org/unicode/reports/tr17/>, August
 531                 2000.
 532
 533   [XMATCH]      Zeilenga, K., "Internationalized String Matching Rules
 534                 for X.500", draft-zeilenga-ldapbis-strmatch-xx.txt, a
 535                 work in progress.
 536
 537   [RFC1345]     Simonsen, K., "Character Mnemonics & Character Sets",
 538                 RFC 1345, June 1992.
 539
 540
 541 Appendix A. Teletex (T.61) to Unicode
 542
 543   This appendix defines an algorithm for transcoding [T.61] characters
 544   to [Unicode] characters for use in string preparation for LDAP
 545   matching rules.  This appendix is normative.
 546
 547   The transcoding algorithm is derived from the T.61-8bit definition
 548   provided in [RFC1345].  With a few exceptions, the T.61 character
 549   codes from x00 to x7f are equivalent to the corresponding [Unicode]
 550   code points, and their values are left unchanged by this algorithm.
 551   E.g. the T.61 code x20 is identical to (U+0020).  The exceptions are
 552   for these T.61 codes that are undefined: x23, x24, x5c, x5e, x60, x7b,
 553   x7d, and x7e.
 554
 555   The codes from x80 to x9f are also equivalent to the corresponding
 556   Unicode code points.  This is specified for completeness only, as
 557   these codes are control characters, and will be mapped to nothing in
 558   the LDAP String Preparation Mapping step.
 559
 560
 561
 562 Zeilenga                        LDAPprep                       [Page 10]
 563 \f
 564 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 565
 566
 567   The remaining T.61 codes are mapped below in Table A.1.  Table
 568   positions marked "??" are undefined.
 569
 570   Input strings containing undefined T.61 codes SHALL produce an
 571   Undefined matching result. For diagnostic purposes, this algorithm
 572   does not fail for undefined input codes.  Instead, undefined codes in
 573   the input are mapped to the Unicode REPLACEMENT CHARACTER (U+FFFD).
 574   As the LDAP String Preparation Prohibit step disallows the REPLACEMENT
 575   CHARACTER from appearing in its output, this transcoding yields the
 576   desired effect.
 577
 578   Note: RFC 1345 listed the non-spacing accent codepoints as residing in
 579         the range starting at (U+E000).  In the current Unicode
 580         standard, the (U+E000) range is reserved for Private Use, and
 581         the non-spacing accents are in the range starting at (U+0300).
 582         The tables here use the (U+0300) range for these accents.
 583
 584      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 585    --+------+------+------+------+------+------+------+------+
 586    a0| 00a0 | 00a1 | 00a2 | 00a3 | 0024 | 00a5 | 0023 | 00a7 |
 587    a8| 00a8 |  ??  |  ??  | 00ab |  ??  |  ??  |  ??  |  ??  |
 588    b0| 00b0 | 00b1 | 00b2 | 00b3 | 00d7 | 00b5 | 00b6 | 00b7 |
 589    b8| 00f7 |  ??  |  ??  | 00bb | 00bc | 00bd | 00be | 00bf |
 590    c0|  ??  | 0300 | 0301 | 0302 | 0303 | 0304 | 0306 | 0307 |
 591    c8| 0308 |  ??  | 030a | 0327 | 0332 | 030b | 0328 | 030c |
 592    d0|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 593    d8|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 594    e0| 2126 | 00c6 | 00d0 | 00aa |  ??  | 0126 | 0132 | 013f |
 595    e8| 0141 | 00d8 | 0152 | 00ba | 00de | 0166 | 014a | 0149 |
 596    f0| 0138 | 00e6 | 0111 | 00f0 | 0127 | 0131 | 0133 | 0140 |
 597    f8| 0142 | 00f8 | 0153 | 00df | 00fe | 0167 | 014b |  ??  |
 598    --+------+------+------+------+------+------+------+------+
 599             Table A.1:  Mapping of 8-bit T.61 codes to Unicode
 600
 601   T.61 also defines a number of accented characters that are formed by
 602   combining an accent prefix followed by a base character.  These
 603   prefixes are in the code range xc1 to xcf. If a prefix character
 604   appears at the end of a string, the result is undefined.  Otherwise
 605   these sequences are mapped to Unicode by substituting the
 606   corresponding non-spacing accent code (as listed in Table A.1) for the
 607   accent prefix, and exchanging the order so that the base character
 608   precedes the accent.
 609
 610
 611 Appendix B. Additional Teletex (T.61) to Unicode Tables
 612
 613   All of the accented characters in T.61 have a corresponding code point
 614   in Unicode.  For the sake of completeness, the combined character
 615
 616
 617
 618 Zeilenga                        LDAPprep                       [Page 11]
 619 \f
 620 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 621
 622
 623   codes are presented in the following tables.  This is informational
 624   only; for matching purposes it is sufficient to map the non-spacing
 625   accent and exchange the order of the character pair as specified in
 626   Appendix A.   This appendix is informative.
 627
 628
 629 B.1. Combinations with SPACE
 630
 631   Accents may be combined with a <SPACE> to generate the accent by
 632   itself.  For each accent code, the result of combining with <SPACE> is
 633   listed in Table B.1.
 634
 635      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 636    --+------+------+------+------+------+------+------+------+
 637    c0|  ??  | 0060 | 00b4 | 005e | 007e | 00af | 02d8 | 02d9 |
 638    c8| 00a8 |  ??  | 02da | 00b8 |  ??  | 02dd | 02db | 02c7 |
 639    --+------+------+------+------+------+------+------+------+
 640        Table B.1:  Mapping of T.61 Accents with <SPACE> to Unicode
 641
 642
 643 B.2. Combinations for xc1: (Grave accent)
 644
 645   T.61 has predefined characters for combinations with A, E, I, O, and
 646   U.  Unicode also defines combinations for N, W, and Y.  All of these
 647   combinations are present in Table B.2.
 648
 649      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 650    --+------+------+------+------+------+------+------+------+
 651    40|  ??  | 00c0 |  ??  |  ??  |  ??  | 00c8 |  ??  |  ??  |
 652    48|  ??  | 00cc |  ??  |  ??  |  ??  |  ??  | 01f8 | 00d2 |
 653    50|  ??  |  ??  |  ??  |  ??  |  ??  | 00d9 |  ??  | 1e80 |
 654    58|  ??  | 1ef2 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 655    60|  ??  | 00e0 |  ??  |  ??  |  ??  | 00e8 |  ??  |  ??  |
 656    68|  ??  | 00ec |  ??  |  ??  |  ??  |  ??  | 01f9 | 00f2 |
 657    70|  ??  |  ??  |  ??  |  ??  |  ??  | 00f9 |  ??  | 1e81 |
 658    78|  ??  | 1ef3 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 659    --+------+------+------+------+------+------+------+------+
 660            Table B.2: Mapping of T.61 Grave Accent Combinations
 661
 662
 663 B.3. Combinations for xc2: (Acute accent)
 664
 665   T.61 has predefined characters for combinations with A, E, I, O, U, Y,
 666   C, L, N, R, S, and Z.  Unicode also defines G, K, M, P, and W.  All of
 667   these combinations are present in Table B.3.
 668
 669      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 670    --+------+------+------+------+------+------+------+------+
 671
 672
 673
 674 Zeilenga                        LDAPprep                       [Page 12]
 675 \f
 676 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 677
 678
 679    40|  ??  | 00c1 |  ??  | 0106 |  ??  | 00c9 |  ??  | 01f4 |
 680    48|  ??  | 00cd |  ??  | 1e30 | 0139 | 1e3e | 0143 | 00d3 |
 681    50| 1e54 |  ??  | 0154 | 015a |  ??  | 00da |  ??  | 1e82 |
 682    58|  ??  | 00dd | 0179 |  ??  |  ??  |  ??  |  ??  |  ??  |
 683    60|  ??  | 00e1 |  ??  | 0107 |  ??  | 00e9 |  ??  | 01f5 |
 684    68|  ??  | 00ed |  ??  | 1e31 | 013a | 1e3f | 0144 | 00f3 |
 685    70| 1e55 |  ??  | 0155 | 015b |  ??  | 00fa |  ??  | 1e83 |
 686    78|  ??  | 00fd | 017a |  ??  |  ??  |  ??  |  ??  |  ??  |
 687    --+------+------+------+------+------+------+------+------+
 688            Table B.3: Mapping of T.61 Acute Accent Combinations
 689
 690
 691 B.4. Combinations for xc3: (Circumflex)
 692
 693   T.61 has predefined characters for combinations with A, E, I, O, U, Y,
 694   C, G, H, J, S, and W.  Unicode also defines the combination for Z.
 695   All of these combinations are present in Table B.4.
 696
 697      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 698    --+------+------+------+------+------+------+------+------+
 699    40|  ??  | 00c2 |  ??  | 0108 |  ??  | 00ca |  ??  | 011c |
 700    48| 0124 | 00ce | 0134 |  ??  |  ??  |  ??  |  ??  | 00d4 |
 701    50|  ??  |  ??  |  ??  | 015c |  ??  | 00db |  ??  | 0174 |
 702    58|  ??  | 0176 | 1e90 |  ??  |  ??  |  ??  |  ??  |  ??  |
 703    60|  ??  | 00e2 |  ??  | 0109 |  ??  | 00ea |  ??  | 011d |
 704    68| 0125 | 00ee | 0135 |  ??  |  ??  |  ??  |  ??  | 00f4 |
 705    70|  ??  |  ??  |  ??  | 015d |  ??  | 00fb |  ??  | 0175 |
 706    78|  ??  | 0177 | 1e91 |  ??  |  ??  |  ??  |  ??  |  ??  |
 707    --+------+------+------+------+------+------+------+------+
 708         Table B.4: Mapping of T.61 Circumflex Accent Combinations
 709
 710
 711 B.5. Combinations for xc4: (Tilde)
 712
 713   T.61 has predefined characters for combinations with A, I, O, U, and
 714   N.  Unicode also defines E, V, and Y.  All of these combinations are
 715   present in Table B.5.
 716
 717      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 718    --+------+------+------+------+------+------+------+------+
 719    40|  ??  | 00c3 |  ??  |  ??  |  ??  | 1ebc |  ??  |  ??  |
 720    48|  ??  | 0128 |  ??  |  ??  |  ??  |  ??  | 00d1 | 00d5 |
 721    50|  ??  |  ??  |  ??  |  ??  |  ??  | 0168 | 1e7c |  ??  |
 722    58|  ??  | 1ef8 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 723    60|  ??  | 00e3 |  ??  |  ??  |  ??  | 1ebd |  ??  |  ??  |
 724    68|  ??  | 0129 |  ??  |  ??  |  ??  |  ??  | 00f1 | 00f5 |
 725    70|  ??  |  ??  |  ??  |  ??  |  ??  | 0169 | 1e7d |  ??  |
 726    78|  ??  | 1ef9 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 727
 728
 729
 730 Zeilenga                        LDAPprep                       [Page 13]
 731 \f
 732 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 733
 734
 735    --+------+------+------+------+------+------+------+------+
 736            Table B.5: Mapping of T.61 Tilde Accent Combinations
 737
 738
 739 B.6. Combinations for xc5: (Macron)
 740
 741   T.61 has predefined characters for combinations with A, E, I, O, and
 742   U.  Unicode also defines Y, G, and AE.  All of these combinations are
 743   present in Table B.6.
 744
 745      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 746    --+------+------+------+------+------+------+------+------+
 747    40|  ??  | 0100 |  ??  |  ??  |  ??  | 0112 |  ??  | 1e20 |
 748    48|  ??  | 012a |  ??  |  ??  |  ??  |  ??  |  ??  | 014c |
 749    50|  ??  |  ??  |  ??  |  ??  |  ??  | 016a |  ??  |  ??  |
 750    58|  ??  | 0232 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 751    60|  ??  | 0101 |  ??  |  ??  |  ??  | 0113 |  ??  | 1e21 |
 752    68|  ??  | 012b |  ??  |  ??  |  ??  |  ??  |  ??  | 014d |
 753    70|  ??  |  ??  |  ??  |  ??  |  ??  | 016b |  ??  |  ??  |
 754    78|  ??  | 0233 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 755    e0|  ??  | 01e2 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 756    f0|  ??  | 01e3 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 757    --+------+------+------+------+------+------+------+------+
 758           Table B.6: Mapping of T.61 Macron Accent Combinations
 759
 760
 761 B.7. Combinations for xc6: (Breve)
 762
 763   T.61 has predefined characters for combinations with A, U, and G.
 764   Unicode also defines E, I, and O.  All of these combinations are
 765   present in Table B.7.
 766
 767      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 768    --+------+------+------+------+------+------+------+------+
 769    40|  ??  | 0102 |  ??  |  ??  |  ??  | 0114 |  ??  | 011e |
 770    48|  ??  | 012c |  ??  |  ??  |  ??  |  ??  |  ??  | 014e |
 771    50|  ??  |  ??  |  ??  |  ??  |  ??  | 016c |  ??  |  ??  |
 772    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 773    60|  ??  | 0103 |  ??  |  ??  |  ??  | 0115 |  ??  | 011f |
 774    68|  ??  | 012d |  ??  |  ??  |  ??  |  ??  | 00f1 | 014f |
 775    70|  ??  |  ??  |  ??  |  ??  |  ??  | 016d |  ??  |  ??  |
 776    78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 777    --+------+------+------+------+------+------+------+------+
 778            Table B.7: Mapping of T.61 Breve Accent Combinations
 779
 780
 781 B.8. Combinations for xc7: (Dot Above)
 782
 783
 784
 785
 786 Zeilenga                        LDAPprep                       [Page 14]
 787 \f
 788 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 789
 790
 791   T.61 has predefined characters for C, E, G, I, and Z.  Unicode also
 792   defines A, O, B, D, F, H, M, N, P, R, S, T, W, X, and Y.  All of these
 793   combinations are present in Table B.8.
 794
 795      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 796    --+------+------+------+------+------+------+------+------+
 797    40|  ??  | 0226 | 1e02 | 010a | 1e0a | 0116 | 1e1e | 0120 |
 798    48| 1e22 | 0130 |  ??  |  ??  |  ??  | 1e40 | 1e44 | 022e |
 799    50| 1e56 |  ??  | 1e58 | 1e60 | 1e6a |  ??  |  ??  | 1e86 |
 800    58| 1e8a | 1e8e | 017b |  ??  |  ??  |  ??  |  ??  |  ??  |
 801    60|  ??  | 0227 | 1e03 | 010b | 1e0b | 0117 | 1e1f | 0121 |
 802    68| 1e23 |  ??  |  ??  |  ??  |  ??  | 1e41 | 1e45 | 022f |
 803    70| 1e57 |  ??  | 1e59 | 1e61 | 1e6b |  ??  |  ??  | 1e87 |
 804    78| 1e8b | 1e8f | 017c |  ??  |  ??  |  ??  |  ??  |  ??  |
 805    --+------+------+------+------+------+------+------+------+
 806          Table B.8: Mapping of T.61 Dot Above Accent Combinations
 807
 808
 809 B.9. Combinations for xc8: (Diaeresis)
 810
 811   T.61 has predefined characters for A, E, I, O, U, and Y.  Unicode also
 812   defines H, W, X, and t.  All of these combinations are present in
 813   Table B.9.
 814
 815      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 816    --+------+------+------+------+------+------+------+------+
 817    40|  ??  | 00c4 |  ??  |  ??  |  ??  | 00cb |  ??  |  ??  |
 818    48| 1e26 | 00cf |  ??  |  ??  |  ??  |  ??  |  ??  | 00d6 |
 819    50|  ??  |  ??  |  ??  |  ??  |  ??  | 00dc |  ??  | 1e84 |
 820    58| 1e8c | 0178 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 821    60|  ??  | 00e4 |  ??  |  ??  |  ??  | 00eb |  ??  |  ??  |
 822    68| 1e27 | 00ef |  ??  |  ??  |  ??  |  ??  |  ??  | 00f6 |
 823    70|  ??  |  ??  |  ??  |  ??  | 1e97 | 00fc |  ??  | 1e85 |
 824    78| 1e8d | 00ff |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 825    --+------+------+------+------+------+------+------+------+
 826          Table B.8: Mapping of T.61 Diaeresis Accent Combinations
 827
 828
 829 B.10. Combinations for xca: (Ring Above)
 830
 831   T.61 has predefined characters for A, and U.  Unicode also defines w
 832   and y.  All of these combinations are present in Table B.10.
 833
 834      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 835    --+------+------+------+------+------+------+------+------+
 836    40|  ??  | 00c5 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 837    48|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 838    50|  ??  |  ??  |  ??  |  ??  |  ??  | 016e |  ??  |  ??  |
 839
 840
 841
 842 Zeilenga                        LDAPprep                       [Page 15]
 843 \f
 844 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 845
 846
 847    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 848    60|  ??  | 00e5 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 849    68|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 850    70|  ??  |  ??  |  ??  |  ??  |  ??  | 016f |  ??  | 1e98 |
 851    78|  ??  | 1e99 |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 852    --+------+------+------+------+------+------+------+------+
 853         Table B.10: Mapping of T.61 Ring Above Accent Combinations
 854
 855
 856 B.11. Combinations for xcb: (Cedilla)
 857
 858   T.61 has predefined characters for C, G, K, L, N, R, S, and T.
 859   Unicode also defines E, D, and H.  All of these combinations are
 860   present in Table B.11.
 861
 862      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 863    --+------+------+------+------+------+------+------+------+
 864    40|  ??  |  ??  |  ??  | 00c7 | 1e10 | 0228 |  ??  | 0122 |
 865    48| 1e28 |  ??  |  ??  | 0136 | 013b |  ??  | 0145 |  ??  |
 866    50|  ??  |  ??  | 0156 | 015e | 0162 |  ??  |  ??  |  ??  |
 867    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 868    60|  ??  |  ??  |  ??  | 00e7 | 1e11 | 0229 |  ??  | 0123 |
 869    68| 1e29 |  ??  |  ??  | 0137 | 013c |  ??  | 0146 |  ??  |
 870    70|  ??  |  ??  | 0157 | 015f | 0163 |  ??  |  ??  |  ??  |
 871    78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 872    --+------+------+------+------+------+------+------+------+
 873          Table B.11: Mapping of T.61 Cedilla Accent Combinations
 874
 875
 876 B.12. Combinations for xcd: (Double Acute Accent)
 877
 878   T.61 has predefined characters for O, and U.  These combinations are
 879   present in Table B.12.
 880
 881      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 882    --+------+------+------+------+------+------+------+------+
 883    48|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  | 0150 |
 884    50|  ??  |  ??  |  ??  |  ??  |  ??  | 0170 |  ??  |  ??  |
 885    68|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  | 0151 |
 886    70|  ??  |  ??  |  ??  |  ??  |  ??  | 0171 |  ??  |  ??  |
 887    --+------+------+------+------+------+------+------+------+
 888        Table B.12: Mapping of T.61 Double Acute Accent Combinations
 889
 890
 891 B.13. Combinations for xce: (Ogonek)
 892
 893   T.61 has predefined characters for A, E, I, and U.  Unicode also
 894   defines the combination for O.  All of these combinations are present
 895
 896
 897
 898 Zeilenga                        LDAPprep                       [Page 16]
 899 \f
 900 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 901
 902
 903   in Table B.13.
 904
 905      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 906    --+------+------+------+------+------+------+------+------+
 907    40|  ??  | 0104 |  ??  |  ??  |  ??  | 0118 |  ??  |  ??  |
 908    48|  ??  | 012e |  ??  |  ??  |  ??  |  ??  |  ??  | 01ea |
 909    50|  ??  |  ??  |  ??  |  ??  |  ??  | 0172 |  ??  |  ??  |
 910    58|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 911    60|  ??  | 0105 |  ??  |  ??  |  ??  | 0119 |  ??  |  ??  |
 912    68|  ??  | 012f |  ??  |  ??  |  ??  |  ??  |  ??  | 01eb |
 913    70|  ??  |  ??  |  ??  |  ??  |  ??  | 0173 |  ??  |  ??  |
 914    78|  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |  ??  |
 915    --+------+------+------+------+------+------+------+------+
 916           Table B.13: Mapping of T.61 Ogonek Accent Combinations
 917
 918
 919 B.14. Combinations for xcf: (Caron)
 920
 921   T.61 has predefined characters for C, D, E, L, N, R, S, T, and Z.
 922   Unicode also defines A, I, O, U, G, H, j,and K.  All of these
 923   combinations are present in Table B.14.
 924
 925      |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    7 |
 926    --+------+------+------+------+------+------+------+------+
 927    40|  ??  | 01cd |  ??  | 010c | 010e | 011a |  ??  | 01e6 |
 928    48| 021e | 01cf |  ??  | 01e8 | 013d |  ??  | 0147 | 01d1 |
 929    50|  ??  |  ??  | 0158 | 0160 | 0164 | 01d3 |  ??  |  ??  |
 930    58|  ??  |  ??  | 017d |  ??  |  ??  |  ??  |  ??  |  ??  |
 931    60|  ??  | 01ce |  ??  | 010d | 010f | 011b |  ??  | 01e7 |
 932    68| 021f | 01d0 | 01f0 | 01e9 | 013e |  ??  | 0148 | 01d2 |
 933    70|  ??  |  ??  | 0159 | 0161 | 0165 | 01d4 |  ??  |  ??  |
 934    78|  ??  |  ??  | 017e |  ??  |  ??  |  ??  |  ??  |  ??  |
 935    --+------+------+------+------+------+------+------+------+
 936           Table B.14: Mapping of T.61 Caron Accent Combinations
 937
 938
 939   Appendix B -- Mapping Table
 940
 941   Input       Output
 942   -----       ------
 943   0000-0008
 944   0009-000D   0020
 945   000E-001F
 946   007F-009F
 947   0085        0020
 948   00A0        0020
 949   00AD
 950   034F
 951
 952
 953
 954 Zeilenga                        LDAPprep                       [Page 17]
 955 \f
 956 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
 957
 958
 959   06DD
 960   070F
 961   1680        0020
 962   1806
 963   180B-180E
 964   2000-200A   0020
 965   200B-200F
 966   2028-2029   0020
 967   202A-202E
 968   202F        0020
 969   205F        0020
 970   2060-2063
 971   206A-206F
 972   3000        0020
 973   FEFF
 974   FF00-FE0F
 975   FFF9-FFFC
 976   1D173-1D17A
 977   E0001
 978   E0020-E007F
 979
 980
 981
 982 Intellectual Property Rights
 983
 984   The IETF takes no position regarding the validity or scope of any
 985   intellectual property or other rights that might be claimed to pertain
 986   to the implementation or use of the technology described in this
 987   document or the extent to which any license under such rights might or
 988   might not be available; neither does it represent that it has made any
 989   effort to identify any such rights.  Information on the IETF's
 990   procedures with respect to rights in standards-track and
 991   standards-related documentation can be found in BCP-11.  Copies of
 992   claims of rights made available for publication and any assurances of
 993   licenses to be made available, or the result of an attempt made to
 994   obtain a general license or permission for the use of such proprietary
 995   rights by implementors or users of this specification can be obtained
 996   from the IETF Secretariat.
 997
 998   The IETF invites any interested party to bring to its attention any
 999   copyrights, patents or patent applications, or other proprietary
1000   rights which may cover technology that may be required to practice
1001   this standard.  Please address the information to the IETF Executive
1002   Director.
1003
1004
1005
1006 Full Copyright
1007
1008
1009
1010 Zeilenga                        LDAPprep                       [Page 18]
1011 \f
1012 Internet-Draft        draft-ietf-ldapbis-strprep-03     15 February 2004
1013
1014
1015   Copyright (C) The Internet Society (2004). All Rights Reserved.
1016
1017   This document and translations of it may be copied and furnished to
1018   others, and derivative works that comment on or otherwise explain it
1019   or assist in its implementation may be prepared, copied, published and
1020   distributed, in whole or in part, without restriction of any kind,
1021   provided that the above copyright notice and this paragraph are
1022   included on all such copies and derivative works.  However, this
1023   document itself may not be modified in any way, such as by removing
1024   the copyright notice or references to the Internet Society or other
1025   Internet organizations, except as needed for the  purpose of
1026   developing Internet standards in which case the procedures for
1027   copyrights defined in the Internet Standards process must be followed,
1028   or as required to translate it into languages other than English.
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066 Zeilenga                        LDAPprep                       [Page 19]
1067 \f