git.sur5r.net Git - openldap/blob - doc/guide/admin/replication.sdf

   1 # $OpenLDAP$
   2 # Copyright 1999-2007 The OpenLDAP Foundation, All Rights Reserved.
   3 # COPYING RESTRICTIONS APPLY, see COPYRIGHT.
   4
   5 H1: Replication
   6
   7 Replicated directories are a fundamental requirement for delivering a
   8 resilient enterprise deployment.
   9
  10 OpenLDAP has various configuration options for creating a replicated
  11 directory. The following sections will discuss these.
  12
  13 H2: Replication Strategies
  14
  15 H3: Pull Based
  16
  17
  18 H4: syncrepl replication
  19
  20
  21 H4: delta-syncrepl replication
  22
  23
  24 H3: Push Based
  25
  26
  27 H5: Replacing Slurpd
  28
  29 Slurpd replication has been deprecated in favor of Syncrepl replication and
  30 has been completely removed from 2.4.
  31
  32 The slurpd daemon; inherited from UMich's LDAP, operates in
  33 push mode: the master pushes changes to the slaves.
  34
  35 It has been replaced for many reasons, in brief:
  36
  37 It is not reliable. It is extremely sensitive to the ordering of
  38 records in the replog; it can easily go out of sync at which point
  39 manual intervention is required to resync the slave database with the
  40 master.
  41
  42 Syncrepl is self-synchronizing; you can start with a database in any
  43 state from totally empty to fully sync'd and it will automatically do
  44 the right thing to achieve and maintain synchronization.
  45
  46 Slurpd isn't very tolerant of unavailable servers. If a slave goes down
  47 for a long time, the replog may grow to a size that's too large for
  48 slurpd to process. Some of these problems are fixable, but there's
  49 really no point. Syncrepl covers all the bases slurpd did, plus more.
  50
  51 * Replication via syncrepl, the LDAP content synchronization operation (LDAP sync, RFC 4533). Introduced in OpenLDAP 2.2, it operates in pull mode: the consumer pulls the updates out of the producer. When used in refreshOnly mode, the producer barely knows it's acting as a master, while the refreshAndPersist mode requires the producer to support persistent searches. Either mode requires the provider and the consumer to support the controls related to the Sync Operation.
  52
  53      Can you elaborate in a reply to me? I have no
  54 > braindead-automatically-attached-policy about e-mail confidentiality :-)
  55
  56 Sure...
  57
  58 > I have set up something using slurpd because I understood that using
  59 > replsync, the replica would need an access on the master, whereas slurpd
  60 > allowed a pure push method, where the replicas have no right to connect to
  61 > the master (the master can even be firewalled)
  62
  63 Syncrepl can operate in either direction. In the pure push/firewall
  64 case, just set up a proxy backend as the syncrepl consumer. test045 and
  65 test048 in the test suite both demonstrate how to configure this. Those
  66 tests are in OpenLDAP 2.4, but you can do something similar in 2.3. You
  67 just need to use a separate slapd instance for the consumer in 2.3.
  68
  69 Just because the protocol was defined a particular way (consumer
  70 initiated single master replication) doesn't mean it can't be used in
  71 other ways. OpenLDAP is far more flexible than that. We've enhanced the
  72 basic syncrepl functionality a number of different ways (delta-syncrepl,
  73 proxied syncrepl, mirrormode, and multimaster) all without altering any
  74 of the syncrepl protocol definition. All it takes is a little creativity
  75 to assemble the pieces in the proper order.
  76
  77
  78
  79
  80 What was it replaced with?
  81
  82 Why is Syncrepl better?
  83
  84 How do I implement a pushed based replication system using Syncrepl?
  85
  86 H4: Working with Firewalls
  87
  88
  89 H2: Replication Types
  90
  91
  92 H3: syncrepl replication
  93
  94
  95 H3: delta-syncrepl replication
  96
  97
  98 H3: N-Way Multi-Master
  99
 100 http://www.connexitor.com/blog/pivot/entry.php?id=105#body
 101 http://www.openldap.org/lists/openldap-software/200702/msg00006.html
 102 http://www.openldap.org/lists/openldap-software/200602/msg00064.html
 103
 104
 105 H3: MirrorMode
 106
 107
 108 H2: LDAP Sync Replication
 109
 110 The {{TERM:LDAP Sync}} Replication engine, {{TERM:syncrepl}} for
 111 short, is a consumer-side replication engine that enables the
 112 consumer {{TERM:LDAP}} server to maintain a shadow copy of a
 113 {{TERM:DIT}} fragment. A syncrepl engine resides at the consumer-side
 114 as one of the {{slapd}}(8) threads. It creates and maintains a
 115 consumer replica by connecting to the replication provider to perform
 116 the initial DIT content load followed either by periodic content
 117 polling or by timely updates upon content changes.
 118
 119 Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for
 120 short) protocol as the replica synchronization protocol.  It provides
 121 a stateful replication which supports both pull-based and push-based
 122 synchronization and does not mandate the use of a history store.
 123
 124 Syncrepl keeps track of the status of the replication content by
 125 maintaining and exchanging synchronization cookies. Because the
 126 syncrepl consumer and provider maintain their content status, the
 127 consumer can poll the provider content to perform incremental
 128 synchronization by asking for the entries required to make the
 129 consumer replica up-to-date with the provider content. Syncrepl
 130 also enables convenient management of replicas by maintaining replica
 131 status.  The consumer replica can be constructed from a consumer-side
 132 or a provider-side backup at any synchronization status. Syncrepl
 133 can automatically resynchronize the consumer replica up-to-date
 134 with the current provider content.
 135
 136 Syncrepl supports both pull-based and push-based synchronization.
 137 In its basic refreshOnly synchronization mode, the provider uses
 138 pull-based synchronization where the consumer servers need not be
 139 tracked and no history information is maintained.  The information
 140 required for the provider to process periodic polling requests is
 141 contained in the synchronization cookie of the request itself.  To
 142 optimize the pull-based synchronization, syncrepl utilizes the
 143 present phase of the LDAP Sync protocol as well as its delete phase,
 144 instead of falling back on frequent full reloads. To further optimize
 145 the pull-based synchronization, the provider can maintain a per-scope
 146 session log as a history store. In its refreshAndPersist mode of
 147 synchronization, the provider uses a push-based synchronization.
 148 The provider keeps track of the consumer servers that have requested
 149 a persistent search and sends them necessary updates as the provider
 150 replication content gets modified.
 151
 152 With syncrepl, a consumer server can create a replica without
 153 changing the provider's configurations and without restarting the
 154 provider server, if the consumer server has appropriate access
 155 privileges for the DIT fragment to be replicated. The consumer
 156 server can stop the replication also without the need for provider-side
 157 changes and restart.
 158
 159 Syncrepl supports both partial and sparse replications.  The shadow
 160 DIT fragment is defined by a general search criteria consisting of
 161 base, scope, filter, and attribute list.  The replica content is
 162 also subject to the access privileges of the bind identity of the
 163 syncrepl replication connection.
 164
 165
 166 H3: The LDAP Content Synchronization Protocol
 167
 168 The LDAP Sync protocol allows a client to maintain a synchronized
 169 copy of a DIT fragment. The LDAP Sync operation is defined as a set
 170 of controls and other protocol elements which extend the LDAP search
 171 operation. This section introduces the LDAP Content Sync protocol
 172 only briefly.  For more information, refer to {{REF:RFC4533}}.
 173
 174 The LDAP Sync protocol supports both polling and listening for
 175 changes by defining two respective synchronization operations:
 176 {{refreshOnly}} and {{refreshAndPersist}}.  Polling is implemented
 177 by the {{refreshOnly}} operation.  The client copy is synchronized
 178 to the server copy at the time of polling.  The server finishes the
 179 search operation by returning {{SearchResultDone}} at the end of
 180 the search operation as in the normal search.  The listening is
 181 implemented by the {{refreshAndPersist}} operation.  Instead of
 182 finishing the search after returning all entries currently matching
 183 the search criteria, the synchronization search remains persistent
 184 in the server. Subsequent updates to the synchronization content
 185 in the server cause additional entry updates to be sent to the
 186 client.
 187
 188 The {{refreshOnly}} operation and the refresh stage of the
 189 {{refreshAndPersist}} operation can be performed with a present
 190 phase or a delete phase.
 191
 192 In the present phase, the server sends the client the entries updated
 193 within the search scope since the last synchronization. The server
 194 sends all requested attributes, be it changed or not, of the updated
 195 entries.  For each unchanged entry which remains in the scope, the
 196 server sends a present message consisting only of the name of the
 197 entry and the synchronization control representing state present.
 198 The present message does not contain any attributes of the entry.
 199 After the client receives all update and present entries, it can
 200 reliably determine the new client copy by adding the entries added
 201 to the server, by replacing the entries modified at the server, and
 202 by deleting entries in the client copy which have not been updated
 203 nor specified as being present at the server.
 204
 205 The transmission of the updated entries in the delete phase is the
 206 same as in the present phase. The server sends all the requested
 207 attributes of the entries updated within the search scope since the
 208 last synchronization to the client. In the delete phase, however,
 209 the server sends a delete message for each entry deleted from the
 210 search scope, instead of sending present messages.  The delete
 211 message consists only of the name of the entry and the synchronization
 212 control representing state delete.  The new client copy can be
 213 determined by adding, modifying, and removing entries according to
 214 the synchronization control attached to the {{SearchResultEntry}}
 215 message.
 216
 217 In the case that the LDAP Sync server maintains a history store and
 218 can determine which entries are scoped out of the client copy since
 219 the last synchronization time, the server can use the delete phase.
 220 If the server does not maintain any history store, cannot determine
 221 the scoped-out entries from the history store, or the history store
 222 does not cover the outdated synchronization state of the client,
 223 the server should use the present phase.  The use of the present
 224 phase is much more efficient than a full content reload in terms
 225 of the synchronization traffic.  To reduce the synchronization
 226 traffic further, the LDAP Sync protocol also provides several
 227 optimizations such as the transmission of the normalized {{EX:entryUUID}}s
 228 and the transmission of multiple {{EX:entryUUIDs}} in a single
 229 {{syncIdSet}} message.
 230
 231 At the end of the {{refreshOnly}} synchronization, the server sends
 232 a synchronization cookie to the client as a state indicator of the
 233 client copy after the synchronization is completed.  The client
 234 will present the received cookie when it requests the next incremental
 235 synchronization to the server.
 236
 237 When {{refreshAndPersist}} synchronization is used, the server sends
 238 a synchronization cookie at the end of the refresh stage by sending
 239 a Sync Info message with TRUE refreshDone.  It also sends a
 240 synchronization cookie by attaching it to {{SearchResultEntry}}
 241 generated in the persist stage of the synchronization search. During
 242 the persist stage, the server can also send a Sync Info message
 243 containing the synchronization cookie at any time the server wants
 244 to update the client-side state indicator.  The server also updates
 245 a synchronization indicator of the client at the end of the persist
 246 stage.
 247
 248 In the LDAP Sync protocol, entries are uniquely identified by the
 249 {{EX:entryUUID}} attribute value. It can function as a reliable
 250 identifier of the entry. The DN of the entry, on the other hand,
 251 can be changed over time and hence cannot be considered as the
 252 reliable identifier.  The {{EX:entryUUID}} is attached to each
 253 {{SearchResultEntry}} or {{SearchResultReference}} as a part of the
 254 synchronization control.
 255
 256
 257 H3: Syncrepl Details
 258
 259 The syncrepl engine utilizes both the {{refreshOnly}} and the
 260 {{refreshAndPersist}} operations of the LDAP Sync protocol.  If a
 261 syncrepl specification is included in a database definition,
 262 {{slapd}}(8) launches a syncrepl engine as a {{slapd}}(8) thread
 263 and schedules its execution. If the {{refreshOnly}} operation is
 264 specified, the syncrepl engine will be rescheduled at the interval
 265 time after a synchronization operation is completed.  If the
 266 {{refreshAndPersist}} operation is specified, the engine will remain
 267 active and process the persistent synchronization messages from the
 268 provider.
 269
 270 The syncrepl engine utilizes both the present phase and the delete
 271 phase of the refresh synchronization. It is possible to configure
 272 a per-scope session log in the provider server which stores the
 273 {{EX:entryUUID}}s of a finite number of entries deleted from a
 274 replication content.  Multiple replicas of single provider content
 275 share the same per-scope session log. The syncrepl engine uses the
 276 delete phase if the session log is present and the state of the
 277 consumer server is recent enough that no session log entries are
 278 truncated after the last synchronization of the client.  The syncrepl
 279 engine uses the present phase if no session log is configured for
 280 the replication content or if the consumer replica is too outdated
 281 to be covered by the session log.  The current design of the session
 282 log store is memory based, so the information contained in the
 283 session log is not persistent over multiple provider invocations.
 284 It is not currently supported to access the session log store by
 285 using LDAP operations. It is also not currently supported to impose
 286 access control to the session log.
 287
 288 As a further optimization, even in the case the synchronization
 289 search is not associated with any session log, no entries will be
 290 transmitted to the consumer server when there has been no update
 291 in the replication context.
 292
 293 The syncrepl engine, which is a consumer-side replication engine,
 294 can work with any backends. The LDAP Sync provider can be configured
 295 as an overlay on any backend, but works best with the {{back-bdb}}
 296 or {{back-hdb}} backend.
 297
 298 The LDAP Sync provider maintains a {{EX:contextCSN}} for each
 299 database as the current synchronization state indicator of the
 300 provider content.  It is the largest {{EX:entryCSN}} in the provider
 301 context such that no transactions for an entry having smaller
 302 {{EX:entryCSN}} value remains outstanding.  The {{EX:contextCSN}}
 303 could not just be set to the largest issued {{EX:entryCSN}} because
 304 {{EX:entryCSN}} is obtained before a transaction starts and
 305 transactions are not committed in the issue order.
 306
 307 The provider stores the {{EX:contextCSN}} of a context in the
 308 {{EX:contextCSN}} attribute of the context suffix entry. The attribute
 309 is not written to the database after every update operation though;
 310 instead it is maintained primarily in memory. At database start
 311 time the provider reads the last saved {{EX:contextCSN}} into memory
 312 and uses the in-memory copy exclusively thereafter. By default,
 313 changes to the {{EX:contextCSN}} as a result of database updates
 314 will not be written to the database until the server is cleanly
 315 shut down. A checkpoint facility exists to cause the contextCSN to
 316 be written out more frequently if desired.
 317
 318 Note that at startup time, if the provider is unable to read a
 319 {{EX:contextCSN}} from the suffix entry, it will scan the entire
 320 database to determine the value, and this scan may take quite a
 321 long time on a large database. When a {{EX:contextCSN}} value is
 322 read, the database will still be scanned for any {{EX:entryCSN}}
 323 values greater than it, to make sure the {{EX:contextCSN}} value
 324 truly reflects the greatest committed {{EX:entryCSN}} in the database.
 325 On databases which support inequality indexing, setting an eq index
 326 on the {{EX:entryCSN}} attribute and configuring {{contextCSN}}
 327 checkpoints will greatly speed up this scanning step.
 328
 329 If no {{EX:contextCSN}} can be determined by reading and scanning
 330 the database, a new value will be generated. Also, if scanning the
 331 database yielded a greater {{EX:entryCSN}} than was previously
 332 recorded in the suffix entry's {{EX:contextCSN}} attribute, a
 333 checkpoint will be immediately written with the new value.
 334
 335 The consumer also stores its replica state, which is the provider's
 336 {{EX:contextCSN}} received as a synchronization cookie, in the
 337 {{EX:contextCSN}} attribute of the suffix entry.  The replica state
 338 maintained by a consumer server is used as the synchronization state
 339 indicator when it performs subsequent incremental synchronization
 340 with the provider server. It is also used as a provider-side
 341 synchronization state indicator when it functions as a secondary
 342 provider server in a cascading replication configuration.  Since
 343 the consumer and provider state information are maintained in the
 344 same location within their respective databases, any consumer can
 345 be promoted to a provider (and vice versa) without any special
 346 actions.
 347
 348 Because a general search filter can be used in the syncrepl
 349 specification, some entries in the context may be omitted from the
 350 synchronization content.  The syncrepl engine creates a glue entry
 351 to fill in the holes in the replica context if any part of the
 352 replica content is subordinate to the holes. The glue entries will
 353 not be returned in the search result unless {{ManageDsaIT}} control
 354 is provided.
 355
 356 Also as a consequence of the search filter used in the syncrepl
 357 specification, it is possible for a modification to remove an entry
 358 from the replication scope even though the entry has not been deleted
 359 on the provider. Logically the entry must be deleted on the consumer
 360 but in {{refreshOnly}} mode the provider cannot detect and propagate
 361 this change without the use of the session log.
 362
 363
 364 H3: Configuring Syncrepl
 365
 366 Because syncrepl is a consumer-side replication engine, the syncrepl
 367 specification is defined in {{slapd.conf}}(5) of the consumer
 368 server, not in the provider server's configuration file.  The initial
 369 loading of the replica content can be performed either by starting
 370 the syncrepl engine with no synchronization cookie or by populating
 371 the consumer replica by adding an {{TERM:LDIF}} file dumped as a
 372 backup at the provider.
 373
 374 When loading from a backup, it is not required to perform the initial
 375 loading from the up-to-date backup of the provider content. The
 376 syncrepl engine will automatically synchronize the initial consumer
 377 replica to the current provider content. As a result, it is not
 378 required to stop the provider server in order to avoid the replica
 379 inconsistency caused by the updates to the provider content during
 380 the content backup and loading process.
 381
 382 When replicating a large scale directory, especially in a bandwidth
 383 constrained environment, it is advised to load the consumer replica
 384 from a backup instead of performing a full initial load using
 385 syncrepl.
 386
 387
 388 H4: Set up the provider slapd
 389
 390 The provider is implemented as an overlay, so the overlay itself
 391 must first be configured in {{slapd.conf}}(5) before it can be
 392 used. The provider has only two configuration directives, for setting
 393 checkpoints on the {{EX:contextCSN}} and for configuring the session
 394 log.  Because the LDAP Sync search is subject to access control,
 395 proper access control privileges should be set up for the replicated
 396 content.
 397
 398 The {{EX:contextCSN}} checkpoint is configured by the
 399
 400 >       syncprov-checkpoint <ops> <minutes>
 401
 402 directive. Checkpoints are only tested after successful write
 403 operations.  If {{<ops>}} operations or more than {{<minutes>}}
 404 time has passed since the last checkpoint, a new checkpoint is
 405 performed.
 406
 407 The session log is configured by the
 408
 409 >       syncprov-sessionlog <size>
 410
 411 directive, where {{<size>}} is the maximum number of session log
 412 entries the session log can record. When a session log is configured,
 413 it is automatically used for all LDAP Sync searches within the
 414 database.
 415
 416 Note that using the session log requires searching on the {{entryUUID}}
 417 attribute. Setting an eq index on this attribute will greatly benefit
 418 the performance of the session log on the provider.
 419
 420 A more complete example of the {{slapd.conf}}(5) content is thus:
 421
 422 >       database bdb
 423 >       suffix dc=Example,dc=com
 424 >       rootdn dc=Example,dc=com
 425 >       directory /var/ldap/db
 426 >       index objectclass,entryCSN,entryUUID eq
 427 >
 428 >       overlay syncprov
 429 >       syncprov-checkpoint 100 10
 430 >       syncprov-sessionlog 100
 431
 432
 433 H4: Set up the consumer slapd
 434
 435 The syncrepl replication is specified in the database section of
 436 {{slapd.conf}}(5) for the replica context.  The syncrepl engine
 437 is backend independent and the directive can be defined with any
 438 database type.
 439
 440 >       database hdb
 441 >       suffix dc=Example,dc=com
 442 >       rootdn dc=Example,dc=com
 443 >       directory /var/ldap/db
 444 >       index objectclass,entryCSN,entryUUID eq
 445 >
 446 >       syncrepl rid=123
 447 >               provider=ldap://provider.example.com:389
 448 >               type=refreshOnly
 449 >               interval=01:00:00:00
 450 >               searchbase="dc=example,dc=com"
 451 >               filter="(objectClass=organizationalPerson)"
 452 >               scope=sub
 453 >               attrs="cn,sn,ou,telephoneNumber,title,l"
 454 >               schemachecking=off
 455 >               bindmethod=simple
 456 >               binddn="cn=syncuser,dc=example,dc=com"
 457 >               credentials=secret
 458
 459 In this example, the consumer will connect to the provider {{slapd}}(8)
 460 at port 389 of {{FILE:ldap://provider.example.com}} to perform a
 461 polling ({{refreshOnly}}) mode of synchronization once a day.  It
 462 will bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple
 463 authentication with password "secret".  Note that the access control
 464 privilege of {{EX:cn=syncuser,dc=example,dc=com}} should be set
 465 appropriately in the provider to retrieve the desired replication
 466 content. Also the search limits must be high enough on the provider
 467 to allow the syncuser to retrieve a complete copy of the requested
 468 content.  The consumer uses the rootdn to write to its database so
 469 it always has full permissions to write all content.
 470
 471 The synchronization search in the above example will search for the
 472 entries whose objectClass is organizationalPerson in the entire
 473 subtree rooted at {{EX:dc=example,dc=com}}. The requested attributes
 474 are {{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
 475 {{EX:title}}, and {{EX:l}}. The schema checking is turned off, so
 476 that the consumer {{slapd}}(8) will not enforce entry schema
 477 checking when it process updates from the provider {{slapd}}(8).
 478
 479 For more detailed information on the syncrepl directive, see the
 480 {{SECT:syncrepl}} section of {{SECT:The slapd Configuration File}}
 481 chapter of this admin guide.
 482
 483
 484 H4: Start the provider and the consumer slapd
 485
 486 The provider {{slapd}}(8) is not required to be restarted.
 487 {{contextCSN}} is automatically generated as needed: it might be
 488 originally contained in the {{TERM:LDIF}} file, generated by
 489 {{slapadd}} (8), generated upon changes in the context, or generated
 490 when the first LDAP Sync search arrives at the provider.  If an
 491 LDIF file is being loaded which did not previously contain the
 492 {{contextCSN}}, the {{-w}} option should be used with {{slapadd}}
 493 (8) to cause it to be generated. This will allow the server to
 494 startup a little quicker the first time it runs.
 495
 496 When starting a consumer {{slapd}}(8), it is possible to provide
 497 a synchronization cookie as the {{-c cookie}} command line option
 498 in order to start the synchronization from a specific state.  The
 499 cookie is a comma separated list of name=value pairs. Currently
 500 supported syncrepl cookie fields are {{csn=<csn>}} and {{rid=<rid>}}.
 501 {{<csn>}} represents the current synchronization state of the
 502 consumer replica.  {{<rid>}} identifies a consumer replica locally
 503 within the consumer server. It is used to relate the cookie to the
 504 syncrepl definition in {{slapd.conf}}(5) which has the matching
 505 replica identifier.  The {{<rid>}} must have no more than 3 decimal
 506 digits.  The command line cookie overrides the synchronization
 507 cookie stored in the consumer replica database.
 508
 509
 510 H2: N-Way Multi-Master
 511
 512
 513 H2: MirrorMode
 514
 515