git.sur5r.net Git - openldap/blob - doc/guide/admin/replication.sdf

   1 # $OpenLDAP$
   2 # Copyright 1999-2007 The OpenLDAP Foundation, All Rights Reserved.
   3 # COPYING RESTRICTIONS APPLY, see COPYRIGHT.
   4
   5 H1: Replication
   6
   7 Replicated directories are a fundamental requirement for delivering a
   8 resilient enterprise deployment.
   9
  10 OpenLDAP has various configuration options for creating a replicated
  11 directory. The following sections will discuss these.
  12
  13 H2: Replication Strategies
  14
  15 H3: Pull Based
  16
  17
  18 H4: syncrepl replication
  19
  20
  21 H4: delta-syncrepl replication
  22
  23
  24 H3: Push Based
  25
  26
  27 H5: Replacing Slurpd
  28
  29 Slurpd replication has been deprecated in favor of Syncrepl replication and
  30 has been completely removed from 2.4.
  31
  32 {{Why was it replaced?}}
  33
  34 The slurpd daemon was the original replication mechanisim inherited from
  35 UMich's LDAP and operates in push mode: the master pushes changes to the
  36 slaves. It has been replaced for many reasons, in brief:
  37
  38  - It is not reliable
  39  - It is extremely sensitive to the ordering of records in the replog
  40  - It can easily go out of sync, at which point manual intervention is
  41    required to resync the slave database with the master directory
  42  - It isn't very tolerant of unavailable servers. If a slave goes down
  43    for a long time, the replog may grow to a size that's too large for
  44    slurpd to process
  45
  46 {{What was it replaced with?}}
  47
  48 Syncrepl is self-synchronizing; you can start with a database in any
  49 state from totally empty to fully sync'd and it will automatically do
  50 the right thing to achieve and maintain synchronization.
  51
  52
  53 * Replication via syncrepl, the LDAP content synchronization operation (LDAP sync, RFC 4533). Introduced in OpenLDAP 2.2, it operates in pull mode: the consumer pulls the updates out of the producer. When used in refreshOnly mode, the producer barely knows it's acting as a master, while the refreshAndPersist mode requires the producer to support persistent searches. Either mode requires the provider and the consumer to support the controls related to the Sync Operation.
  54
  55      Can you elaborate in a reply to me? I have no
  56 > braindead-automatically-attached-policy about e-mail confidentiality :-)
  57
  58 Sure...
  59
  60 > I have set up something using slurpd because I understood that using
  61 > replsync, the replica would need an access on the master, whereas slurpd
  62 > allowed a pure push method, where the replicas have no right to connect to
  63 > the master (the master can even be firewalled)
  64
  65 Syncrepl can operate in either direction. In the pure push/firewall
  66 case, just set up a proxy backend as the syncrepl consumer. test045 and
  67 test048 in the test suite both demonstrate how to configure this. Those
  68 tests are in OpenLDAP 2.4, but you can do something similar in 2.3. You
  69 just need to use a separate slapd instance for the consumer in 2.3.
  70
  71 Just because the protocol was defined a particular way (consumer
  72 initiated single master replication) doesn't mean it can't be used in
  73 other ways. OpenLDAP is far more flexible than that. We've enhanced the
  74 basic syncrepl functionality a number of different ways (delta-syncrepl,
  75 proxied syncrepl, mirrormode, and multimaster) all without altering any
  76 of the syncrepl protocol definition. All it takes is a little creativity
  77 to assemble the pieces in the proper order.
  78
  79
  80
  81
  82 What was it replaced with?
  83
  84 Why is Syncrepl better?
  85
  86 How do I implement a pushed based replication system using Syncrepl?
  87
  88 H4: Working with Firewalls
  89
  90
  91 H2: Replication Types
  92
  93
  94 H3: syncrepl replication
  95
  96
  97 H3: delta-syncrepl replication
  98
  99
 100 H3: N-Way Multi-Master
 101
 102 http://www.connexitor.com/blog/pivot/entry.php?id=105#body
 103 http://www.openldap.org/lists/openldap-software/200702/msg00006.html
 104 http://www.openldap.org/lists/openldap-software/200602/msg00064.html
 105
 106
 107 H3: MirrorMode
 108
 109
 110 H2: LDAP Sync Replication
 111
 112 The {{TERM:LDAP Sync}} Replication engine, {{TERM:syncrepl}} for
 113 short, is a consumer-side replication engine that enables the
 114 consumer {{TERM:LDAP}} server to maintain a shadow copy of a
 115 {{TERM:DIT}} fragment. A syncrepl engine resides at the consumer-side
 116 as one of the {{slapd}}(8) threads. It creates and maintains a
 117 consumer replica by connecting to the replication provider to perform
 118 the initial DIT content load followed either by periodic content
 119 polling or by timely updates upon content changes.
 120
 121 Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for
 122 short) protocol as the replica synchronization protocol.  It provides
 123 a stateful replication which supports both pull-based and push-based
 124 synchronization and does not mandate the use of a history store.
 125
 126 Syncrepl keeps track of the status of the replication content by
 127 maintaining and exchanging synchronization cookies. Because the
 128 syncrepl consumer and provider maintain their content status, the
 129 consumer can poll the provider content to perform incremental
 130 synchronization by asking for the entries required to make the
 131 consumer replica up-to-date with the provider content. Syncrepl
 132 also enables convenient management of replicas by maintaining replica
 133 status.  The consumer replica can be constructed from a consumer-side
 134 or a provider-side backup at any synchronization status. Syncrepl
 135 can automatically resynchronize the consumer replica up-to-date
 136 with the current provider content.
 137
 138 Syncrepl supports both pull-based and push-based synchronization.
 139 In its basic refreshOnly synchronization mode, the provider uses
 140 pull-based synchronization where the consumer servers need not be
 141 tracked and no history information is maintained.  The information
 142 required for the provider to process periodic polling requests is
 143 contained in the synchronization cookie of the request itself.  To
 144 optimize the pull-based synchronization, syncrepl utilizes the
 145 present phase of the LDAP Sync protocol as well as its delete phase,
 146 instead of falling back on frequent full reloads. To further optimize
 147 the pull-based synchronization, the provider can maintain a per-scope
 148 session log as a history store. In its refreshAndPersist mode of
 149 synchronization, the provider uses a push-based synchronization.
 150 The provider keeps track of the consumer servers that have requested
 151 a persistent search and sends them necessary updates as the provider
 152 replication content gets modified.
 153
 154 With syncrepl, a consumer server can create a replica without
 155 changing the provider's configurations and without restarting the
 156 provider server, if the consumer server has appropriate access
 157 privileges for the DIT fragment to be replicated. The consumer
 158 server can stop the replication also without the need for provider-side
 159 changes and restart.
 160
 161 Syncrepl supports both partial and sparse replications.  The shadow
 162 DIT fragment is defined by a general search criteria consisting of
 163 base, scope, filter, and attribute list.  The replica content is
 164 also subject to the access privileges of the bind identity of the
 165 syncrepl replication connection.
 166
 167
 168 H3: The LDAP Content Synchronization Protocol
 169
 170 The LDAP Sync protocol allows a client to maintain a synchronized
 171 copy of a DIT fragment. The LDAP Sync operation is defined as a set
 172 of controls and other protocol elements which extend the LDAP search
 173 operation. This section introduces the LDAP Content Sync protocol
 174 only briefly.  For more information, refer to {{REF:RFC4533}}.
 175
 176 The LDAP Sync protocol supports both polling and listening for
 177 changes by defining two respective synchronization operations:
 178 {{refreshOnly}} and {{refreshAndPersist}}.  Polling is implemented
 179 by the {{refreshOnly}} operation.  The client copy is synchronized
 180 to the server copy at the time of polling.  The server finishes the
 181 search operation by returning {{SearchResultDone}} at the end of
 182 the search operation as in the normal search.  The listening is
 183 implemented by the {{refreshAndPersist}} operation.  Instead of
 184 finishing the search after returning all entries currently matching
 185 the search criteria, the synchronization search remains persistent
 186 in the server. Subsequent updates to the synchronization content
 187 in the server cause additional entry updates to be sent to the
 188 client.
 189
 190 The {{refreshOnly}} operation and the refresh stage of the
 191 {{refreshAndPersist}} operation can be performed with a present
 192 phase or a delete phase.
 193
 194 In the present phase, the server sends the client the entries updated
 195 within the search scope since the last synchronization. The server
 196 sends all requested attributes, be it changed or not, of the updated
 197 entries.  For each unchanged entry which remains in the scope, the
 198 server sends a present message consisting only of the name of the
 199 entry and the synchronization control representing state present.
 200 The present message does not contain any attributes of the entry.
 201 After the client receives all update and present entries, it can
 202 reliably determine the new client copy by adding the entries added
 203 to the server, by replacing the entries modified at the server, and
 204 by deleting entries in the client copy which have not been updated
 205 nor specified as being present at the server.
 206
 207 The transmission of the updated entries in the delete phase is the
 208 same as in the present phase. The server sends all the requested
 209 attributes of the entries updated within the search scope since the
 210 last synchronization to the client. In the delete phase, however,
 211 the server sends a delete message for each entry deleted from the
 212 search scope, instead of sending present messages.  The delete
 213 message consists only of the name of the entry and the synchronization
 214 control representing state delete.  The new client copy can be
 215 determined by adding, modifying, and removing entries according to
 216 the synchronization control attached to the {{SearchResultEntry}}
 217 message.
 218
 219 In the case that the LDAP Sync server maintains a history store and
 220 can determine which entries are scoped out of the client copy since
 221 the last synchronization time, the server can use the delete phase.
 222 If the server does not maintain any history store, cannot determine
 223 the scoped-out entries from the history store, or the history store
 224 does not cover the outdated synchronization state of the client,
 225 the server should use the present phase.  The use of the present
 226 phase is much more efficient than a full content reload in terms
 227 of the synchronization traffic.  To reduce the synchronization
 228 traffic further, the LDAP Sync protocol also provides several
 229 optimizations such as the transmission of the normalized {{EX:entryUUID}}s
 230 and the transmission of multiple {{EX:entryUUIDs}} in a single
 231 {{syncIdSet}} message.
 232
 233 At the end of the {{refreshOnly}} synchronization, the server sends
 234 a synchronization cookie to the client as a state indicator of the
 235 client copy after the synchronization is completed.  The client
 236 will present the received cookie when it requests the next incremental
 237 synchronization to the server.
 238
 239 When {{refreshAndPersist}} synchronization is used, the server sends
 240 a synchronization cookie at the end of the refresh stage by sending
 241 a Sync Info message with TRUE refreshDone.  It also sends a
 242 synchronization cookie by attaching it to {{SearchResultEntry}}
 243 generated in the persist stage of the synchronization search. During
 244 the persist stage, the server can also send a Sync Info message
 245 containing the synchronization cookie at any time the server wants
 246 to update the client-side state indicator.  The server also updates
 247 a synchronization indicator of the client at the end of the persist
 248 stage.
 249
 250 In the LDAP Sync protocol, entries are uniquely identified by the
 251 {{EX:entryUUID}} attribute value. It can function as a reliable
 252 identifier of the entry. The DN of the entry, on the other hand,
 253 can be changed over time and hence cannot be considered as the
 254 reliable identifier.  The {{EX:entryUUID}} is attached to each
 255 {{SearchResultEntry}} or {{SearchResultReference}} as a part of the
 256 synchronization control.
 257
 258
 259 H3: Syncrepl Details
 260
 261 The syncrepl engine utilizes both the {{refreshOnly}} and the
 262 {{refreshAndPersist}} operations of the LDAP Sync protocol.  If a
 263 syncrepl specification is included in a database definition,
 264 {{slapd}}(8) launches a syncrepl engine as a {{slapd}}(8) thread
 265 and schedules its execution. If the {{refreshOnly}} operation is
 266 specified, the syncrepl engine will be rescheduled at the interval
 267 time after a synchronization operation is completed.  If the
 268 {{refreshAndPersist}} operation is specified, the engine will remain
 269 active and process the persistent synchronization messages from the
 270 provider.
 271
 272 The syncrepl engine utilizes both the present phase and the delete
 273 phase of the refresh synchronization. It is possible to configure
 274 a per-scope session log in the provider server which stores the
 275 {{EX:entryUUID}}s of a finite number of entries deleted from a
 276 replication content.  Multiple replicas of single provider content
 277 share the same per-scope session log. The syncrepl engine uses the
 278 delete phase if the session log is present and the state of the
 279 consumer server is recent enough that no session log entries are
 280 truncated after the last synchronization of the client.  The syncrepl
 281 engine uses the present phase if no session log is configured for
 282 the replication content or if the consumer replica is too outdated
 283 to be covered by the session log.  The current design of the session
 284 log store is memory based, so the information contained in the
 285 session log is not persistent over multiple provider invocations.
 286 It is not currently supported to access the session log store by
 287 using LDAP operations. It is also not currently supported to impose
 288 access control to the session log.
 289
 290 As a further optimization, even in the case the synchronization
 291 search is not associated with any session log, no entries will be
 292 transmitted to the consumer server when there has been no update
 293 in the replication context.
 294
 295 The syncrepl engine, which is a consumer-side replication engine,
 296 can work with any backends. The LDAP Sync provider can be configured
 297 as an overlay on any backend, but works best with the {{back-bdb}}
 298 or {{back-hdb}} backend.
 299
 300 The LDAP Sync provider maintains a {{EX:contextCSN}} for each
 301 database as the current synchronization state indicator of the
 302 provider content.  It is the largest {{EX:entryCSN}} in the provider
 303 context such that no transactions for an entry having smaller
 304 {{EX:entryCSN}} value remains outstanding.  The {{EX:contextCSN}}
 305 could not just be set to the largest issued {{EX:entryCSN}} because
 306 {{EX:entryCSN}} is obtained before a transaction starts and
 307 transactions are not committed in the issue order.
 308
 309 The provider stores the {{EX:contextCSN}} of a context in the
 310 {{EX:contextCSN}} attribute of the context suffix entry. The attribute
 311 is not written to the database after every update operation though;
 312 instead it is maintained primarily in memory. At database start
 313 time the provider reads the last saved {{EX:contextCSN}} into memory
 314 and uses the in-memory copy exclusively thereafter. By default,
 315 changes to the {{EX:contextCSN}} as a result of database updates
 316 will not be written to the database until the server is cleanly
 317 shut down. A checkpoint facility exists to cause the contextCSN to
 318 be written out more frequently if desired.
 319
 320 Note that at startup time, if the provider is unable to read a
 321 {{EX:contextCSN}} from the suffix entry, it will scan the entire
 322 database to determine the value, and this scan may take quite a
 323 long time on a large database. When a {{EX:contextCSN}} value is
 324 read, the database will still be scanned for any {{EX:entryCSN}}
 325 values greater than it, to make sure the {{EX:contextCSN}} value
 326 truly reflects the greatest committed {{EX:entryCSN}} in the database.
 327 On databases which support inequality indexing, setting an eq index
 328 on the {{EX:entryCSN}} attribute and configuring {{contextCSN}}
 329 checkpoints will greatly speed up this scanning step.
 330
 331 If no {{EX:contextCSN}} can be determined by reading and scanning
 332 the database, a new value will be generated. Also, if scanning the
 333 database yielded a greater {{EX:entryCSN}} than was previously
 334 recorded in the suffix entry's {{EX:contextCSN}} attribute, a
 335 checkpoint will be immediately written with the new value.
 336
 337 The consumer also stores its replica state, which is the provider's
 338 {{EX:contextCSN}} received as a synchronization cookie, in the
 339 {{EX:contextCSN}} attribute of the suffix entry.  The replica state
 340 maintained by a consumer server is used as the synchronization state
 341 indicator when it performs subsequent incremental synchronization
 342 with the provider server. It is also used as a provider-side
 343 synchronization state indicator when it functions as a secondary
 344 provider server in a cascading replication configuration.  Since
 345 the consumer and provider state information are maintained in the
 346 same location within their respective databases, any consumer can
 347 be promoted to a provider (and vice versa) without any special
 348 actions.
 349
 350 Because a general search filter can be used in the syncrepl
 351 specification, some entries in the context may be omitted from the
 352 synchronization content.  The syncrepl engine creates a glue entry
 353 to fill in the holes in the replica context if any part of the
 354 replica content is subordinate to the holes. The glue entries will
 355 not be returned in the search result unless {{ManageDsaIT}} control
 356 is provided.
 357
 358 Also as a consequence of the search filter used in the syncrepl
 359 specification, it is possible for a modification to remove an entry
 360 from the replication scope even though the entry has not been deleted
 361 on the provider. Logically the entry must be deleted on the consumer
 362 but in {{refreshOnly}} mode the provider cannot detect and propagate
 363 this change without the use of the session log.
 364
 365
 366 H3: Configuring Syncrepl
 367
 368 Because syncrepl is a consumer-side replication engine, the syncrepl
 369 specification is defined in {{slapd.conf}}(5) of the consumer
 370 server, not in the provider server's configuration file.  The initial
 371 loading of the replica content can be performed either by starting
 372 the syncrepl engine with no synchronization cookie or by populating
 373 the consumer replica by adding an {{TERM:LDIF}} file dumped as a
 374 backup at the provider.
 375
 376 When loading from a backup, it is not required to perform the initial
 377 loading from the up-to-date backup of the provider content. The
 378 syncrepl engine will automatically synchronize the initial consumer
 379 replica to the current provider content. As a result, it is not
 380 required to stop the provider server in order to avoid the replica
 381 inconsistency caused by the updates to the provider content during
 382 the content backup and loading process.
 383
 384 When replicating a large scale directory, especially in a bandwidth
 385 constrained environment, it is advised to load the consumer replica
 386 from a backup instead of performing a full initial load using
 387 syncrepl.
 388
 389
 390 H4: Set up the provider slapd
 391
 392 The provider is implemented as an overlay, so the overlay itself
 393 must first be configured in {{slapd.conf}}(5) before it can be
 394 used. The provider has only two configuration directives, for setting
 395 checkpoints on the {{EX:contextCSN}} and for configuring the session
 396 log.  Because the LDAP Sync search is subject to access control,
 397 proper access control privileges should be set up for the replicated
 398 content.
 399
 400 The {{EX:contextCSN}} checkpoint is configured by the
 401
 402 >       syncprov-checkpoint <ops> <minutes>
 403
 404 directive. Checkpoints are only tested after successful write
 405 operations.  If {{<ops>}} operations or more than {{<minutes>}}
 406 time has passed since the last checkpoint, a new checkpoint is
 407 performed.
 408
 409 The session log is configured by the
 410
 411 >       syncprov-sessionlog <size>
 412
 413 directive, where {{<size>}} is the maximum number of session log
 414 entries the session log can record. When a session log is configured,
 415 it is automatically used for all LDAP Sync searches within the
 416 database.
 417
 418 Note that using the session log requires searching on the {{entryUUID}}
 419 attribute. Setting an eq index on this attribute will greatly benefit
 420 the performance of the session log on the provider.
 421
 422 A more complete example of the {{slapd.conf}}(5) content is thus:
 423
 424 >       database bdb
 425 >       suffix dc=Example,dc=com
 426 >       rootdn dc=Example,dc=com
 427 >       directory /var/ldap/db
 428 >       index objectclass,entryCSN,entryUUID eq
 429 >
 430 >       overlay syncprov
 431 >       syncprov-checkpoint 100 10
 432 >       syncprov-sessionlog 100
 433
 434
 435 H4: Set up the consumer slapd
 436
 437 The syncrepl replication is specified in the database section of
 438 {{slapd.conf}}(5) for the replica context.  The syncrepl engine
 439 is backend independent and the directive can be defined with any
 440 database type.
 441
 442 >       database hdb
 443 >       suffix dc=Example,dc=com
 444 >       rootdn dc=Example,dc=com
 445 >       directory /var/ldap/db
 446 >       index objectclass,entryCSN,entryUUID eq
 447 >
 448 >       syncrepl rid=123
 449 >               provider=ldap://provider.example.com:389
 450 >               type=refreshOnly
 451 >               interval=01:00:00:00
 452 >               searchbase="dc=example,dc=com"
 453 >               filter="(objectClass=organizationalPerson)"
 454 >               scope=sub
 455 >               attrs="cn,sn,ou,telephoneNumber,title,l"
 456 >               schemachecking=off
 457 >               bindmethod=simple
 458 >               binddn="cn=syncuser,dc=example,dc=com"
 459 >               credentials=secret
 460
 461 In this example, the consumer will connect to the provider {{slapd}}(8)
 462 at port 389 of {{FILE:ldap://provider.example.com}} to perform a
 463 polling ({{refreshOnly}}) mode of synchronization once a day.  It
 464 will bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple
 465 authentication with password "secret".  Note that the access control
 466 privilege of {{EX:cn=syncuser,dc=example,dc=com}} should be set
 467 appropriately in the provider to retrieve the desired replication
 468 content. Also the search limits must be high enough on the provider
 469 to allow the syncuser to retrieve a complete copy of the requested
 470 content.  The consumer uses the rootdn to write to its database so
 471 it always has full permissions to write all content.
 472
 473 The synchronization search in the above example will search for the
 474 entries whose objectClass is organizationalPerson in the entire
 475 subtree rooted at {{EX:dc=example,dc=com}}. The requested attributes
 476 are {{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
 477 {{EX:title}}, and {{EX:l}}. The schema checking is turned off, so
 478 that the consumer {{slapd}}(8) will not enforce entry schema
 479 checking when it process updates from the provider {{slapd}}(8).
 480
 481 For more detailed information on the syncrepl directive, see the
 482 {{SECT:syncrepl}} section of {{SECT:The slapd Configuration File}}
 483 chapter of this admin guide.
 484
 485
 486 H4: Start the provider and the consumer slapd
 487
 488 The provider {{slapd}}(8) is not required to be restarted.
 489 {{contextCSN}} is automatically generated as needed: it might be
 490 originally contained in the {{TERM:LDIF}} file, generated by
 491 {{slapadd}} (8), generated upon changes in the context, or generated
 492 when the first LDAP Sync search arrives at the provider.  If an
 493 LDIF file is being loaded which did not previously contain the
 494 {{contextCSN}}, the {{-w}} option should be used with {{slapadd}}
 495 (8) to cause it to be generated. This will allow the server to
 496 startup a little quicker the first time it runs.
 497
 498 When starting a consumer {{slapd}}(8), it is possible to provide
 499 a synchronization cookie as the {{-c cookie}} command line option
 500 in order to start the synchronization from a specific state.  The
 501 cookie is a comma separated list of name=value pairs. Currently
 502 supported syncrepl cookie fields are {{csn=<csn>}} and {{rid=<rid>}}.
 503 {{<csn>}} represents the current synchronization state of the
 504 consumer replica.  {{<rid>}} identifies a consumer replica locally
 505 within the consumer server. It is used to relate the cookie to the
 506 syncrepl definition in {{slapd.conf}}(5) which has the matching
 507 replica identifier.  The {{<rid>}} must have no more than 3 decimal
 508 digits.  The command line cookie overrides the synchronization
 509 cookie stored in the consumer replica database.
 510
 511
 512 H2: N-Way Multi-Master
 513
 514
 515 H2: MirrorMode
 516
 517