git.sur5r.net Git - openldap/blob - doc/guide/admin/syncrepl.sdf

   1 # $OpenLDAP$
   2 # Copyright 2003-2006 The OpenLDAP Foundation, All Rights Reserved.
   3 # COPYING RESTRICTIONS APPLY, see COPYRIGHT.
   4
   5 H1: LDAP Sync Replication
   6
   7 The LDAP Sync replication engine, syncrepl for short, is a consumer-side
   8 replication engine that enables the consumer LDAP server to maintain
   9 a shadow copy of a DIT fragment. A syncrepl engine resides at the
  10 consumer-side as one of the {{slapd}} (8) threads. It creates and
  11 maintains a consumer replica by connecting to the replication
  12 provider to perform the initial DIT content load followed either
  13 by periodic content polling or by timely updates upon content
  14 changes.
  15
  16 Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for
  17 short) protocol as the replica synchronization protocol.  It provides
  18 a stateful replication which supports both pull-based and push-based
  19 synchronization and does not mandate the use of a history store.
  20
  21 Syncrepl keeps track of the status of the replication content by
  22 maintaining and exchanging synchronization cookies. Because the
  23 syncrepl consumer and provider maintain their content status, the
  24 consumer can poll the provider content to perform incremental
  25 synchronization by asking for the entries required to make the
  26 consumer replica up-to-date with the provider content. Syncrepl
  27 also enables convenient management of replicas by maintaining replica
  28 status.  The consumer replica can be constructed from a consumer-side
  29 or a provider-side backup at any synchronization status. Syncrepl
  30 can automatically resynchronize the consumer replica up-to-date
  31 with the current provider content.
  32
  33 Syncrepl supports both pull-based and push-based synchronization.
  34 In its basic refreshOnly synchronization mode, the provider uses
  35 pull-based synchronization where the consumer servers need not be
  36 tracked and no history information is maintained.  The information
  37 required for the provider to process periodic polling requests is
  38 contained in the synchronization cookie of the request itself.  To
  39 optimize the pull-based synchronization, syncrepl utilizes the
  40 present phase of the LDAP Sync protocol as well as its delete phase,
  41 instead of falling back on frequent full reloads. To further optimize
  42 the pull-based synchronization, the provider can maintain a per-scope
  43 session log as a history store. In its refreshAndPersist mode of
  44 synchronization, the provider uses a push-based synchronization.
  45 The provider keeps track of the consumer servers that have requested
  46 a persistent search and sends them necessary updates as the provider
  47 replication content gets modified.
  48
  49 With syncrepl, a consumer server can create a replica without
  50 changing the provider's configurations and without restarting the
  51 provider server, if the consumer server has appropriate access
  52 privileges for the DIT fragment to be replicated. The consumer
  53 server can stop the replication also without the need for provider-side
  54 changes and restart.
  55
  56 Syncrepl supports both partial and sparse replications.  The shadow
  57 DIT fragment is defined by a general search criteria consisting of
  58 base, scope, filter, and attribute list.  The replica content is
  59 also subject to the access privileges of the bind identity of the
  60 syncrepl replication connection.
  61
  62
  63 H2: The LDAP Content Synchronization Protocol
  64
  65 The LDAP Sync protocol allows a client to maintain a synchronized
  66 copy of a DIT fragment. The LDAP Sync operation is defined as a set
  67 of controls and other protocol elements which extend the LDAP search
  68 operation. This section introduces the LDAP Content Sync protocol
  69 only briefly. For more information, refer to the Internet Draft
  70 {{The LDAP Content Synchronization Operation
  71 <draft-zeilenga-ldup-sync-05.txt>}}.
  72
  73 The LDAP Sync protocol supports both polling and listening for
  74 changes by defining two respective synchronization operations:
  75 {{refreshOnly}} and {{refreshAndPersist}}.  Polling is implemented
  76 by the {{refreshOnly}} operation.  The client copy is synchronized
  77 to the server copy at the time of polling.  The server finishes the
  78 search operation by returning {{SearchResultDone}} at the end of
  79 the search operation as in the normal search.  The listening is
  80 implemented by the {{refreshAndPersist}} operation.  Instead of
  81 finishing the search after returning all entries currently matching
  82 the search criteria, the synchronization search remains persistent
  83 in the server. Subsequent updates to the synchronization content
  84 in the server cause additional entry updates to be sent to the
  85 client.
  86
  87 The {{refreshOnly}} operation and the refresh stage of the
  88 {{refreshAndPersist}} operation can be performed with a present
  89 phase or a delete phase.
  90
  91 In the present phase, the server sends the client the entries updated
  92 within the search scope since the last synchronization. The server
  93 sends all requested attributes, be it changed or not, of the updated
  94 entries.  For each unchanged entry which remains in the scope, the
  95 server sends a present message consisting only of the name of the
  96 entry and the synchronization control representing state present.
  97 The present message does not contain any attributes of the entry.
  98 After the client receives all update and present entries, it can
  99 reliably determine the new client copy by adding the entries added
 100 to the server, by replacing the entries modified at the server, and
 101 by deleting entries in the client copy which have not been updated
 102 nor specified as being present at the server.
 103
 104 The transmission of the updated entries in the delete phase is the
 105 same as in the present phase. The server sends all the requested
 106 attributes of the entries updated within the search scope since the
 107 last synchronization to the client. In the delete phase, however,
 108 the server sends a delete message for each entry deleted from the
 109 search scope, instead of sending present messages.  The delete
 110 message consists only of the name of the entry and the synchronization
 111 control representing state delete.  The new client copy can be
 112 determined by adding, modifying, and removing entries according to
 113 the synchronization control attached to the {{SearchResultEntry}}
 114 message.
 115
 116 In the case that the LDAP Sync server maintains a history store and
 117 can determine which entries are scoped out of the client copy since
 118 the last synchronization time, the server can use the delete phase.
 119 If the server does not maintain any history store, cannot determine
 120 the scoped-out entries from the history store, or the history store
 121 does not cover the outdated synchronization state of the client,
 122 the server should use the present phase.  The use of the present
 123 phase is much more efficient than a full content reload in terms
 124 of the synchronization traffic.  To reduce the synchronization
 125 traffic further, the LDAP Sync protocol also provides several
 126 optimizations such as the transmission of the normalized {{EX:entryUUID}}s
 127 and the transmission of multiple {{EX:entryUUIDs}} in a single
 128 {{syncIdSet}} message.
 129
 130 At the end of the {{refreshOnly}} synchronization, the server sends
 131 a synchronization cookie to the client as a state indicator of the
 132 client copy after the synchronization is completed.  The client
 133 will present the received cookie when it requests the next incremental
 134 synchronization to the server.
 135
 136 When {{refreshAndPersist}} synchronization is used, the server sends
 137 a synchronization cookie at the end of the refresh stage by sending
 138 a Sync Info message with TRUE refreshDone.  It also sends a
 139 synchronization cookie by attaching it to {{SearchResultEntry}}
 140 generated in the persist stage of the synchronization search. During
 141 the persist stage, the server can also send a Sync Info message
 142 containing the synchronization cookie at any time the server wants
 143 to update the client-side state indicator.  The server also updates
 144 a synchronization indicator of the client at the end of the persist
 145 stage.
 146
 147 In the LDAP Sync protocol, entries are uniquely identified by the
 148 {{EX:entryUUID}} attribute value. It can function as a reliable
 149 identifier of the entry. The DN of the entry, on the other hand,
 150 can be changed over time and hence cannot be considered as the
 151 reliable identifier.  The {{EX:entryUUID}} is attached to each
 152 {{SearchResultEntry}} or {{SearchResultReference}} as a part of the
 153 synchronization control.
 154
 155
 156 H2: Syncrepl Details
 157
 158 The syncrepl engine utilizes both the {{refreshOnly}} and the
 159 {{refreshAndPersist}} operations of the LDAP Sync protocol.  If a
 160 syncrepl specification is included in a database definition, {{slapd}}
 161 (8) launches a syncrepl engine as a {{slapd}} (8) thread and schedules
 162 its execution. If the {{refreshOnly}} operation is specified, the
 163 syncrepl engine will be rescheduled at the interval time after a
 164 synchronization operation is completed.  If the {{refreshAndPersist}}
 165 operation is specified, the engine will remain active and process
 166 the persistent synchronization messages from the provider.
 167
 168 The syncrepl engine utilizes both the present phase and the delete
 169 phase of the refresh synchronization. It is possible to configure
 170 a per-scope session log in the provider server which stores the
 171 {{EX:entryUUID}}s of a finite number of entries deleted from a
 172 replication content.  Multiple replicas of single provider content
 173 share the same per-scope session log. The syncrepl engine uses the
 174 delete phase if the session log is present and the state of the
 175 consumer server is recent enough that no session log entries are
 176 truncated after the last synchronization of the client.  The syncrepl
 177 engine uses the present phase if no session log is configured for
 178 the replication content or if the consumer replica is too outdated
 179 to be covered by the session log.  The current design of the session
 180 log store is memory based, so the information contained in the
 181 session log is not persistent over multiple provider invocations.
 182 It is not currently supported to access the session log store by
 183 using LDAP operations. It is also not currently supported to impose
 184 access control to the session log.
 185
 186 As a further optimization, even in the case the synchronization
 187 search is not associated with any session log, no entries will be
 188 transmitted to the consumer server when there has been no update
 189 in the replication context.
 190
 191 The syncrepl engine, which is a consumer-side replication engine,
 192 can work with any backends. The LDAP Sync provider can be configured
 193 as an overlay on any backend, but works best with the {{back-bdb}}
 194 or {{back-hdb}} backend. The provider can not support refreshAndPersist
 195 mode on {{back-ldbm}} due to limits in that backend's locking
 196 architecture.
 197
 198 The LDAP Sync provider maintains a {{EX:contextCSN}} for each
 199 database as the current synchronization state indicator of the
 200 provider content.  It is the largest {{EX:entryCSN}} in the provider
 201 context such that no transactions for an entry having smaller
 202 {{EX:entryCSN}} value remains outstanding.  The {{EX:contextCSN}}
 203 could not just be set to the largest issued {{EX:entryCSN}} because
 204 {{EX:entryCSN}} is obtained before a transaction starts and
 205 transactions are not committed in the issue order.
 206
 207 The provider stores the {{EX:contextCSN}} of a context in the
 208 {{EX:contextCSN}} attribute of the context suffix entry. The attribute
 209 is not written to the database after every update operation though;
 210 instead it is maintained primarily in memory. At database start
 211 time the provider reads the last saved {{EX:contextCSN}} into memory
 212 and uses the in-memory copy exclusively thereafter. By default,
 213 changes to the {{EX:contextCSN}} as a result of database updates
 214 will not be written to the database until the server is cleanly
 215 shut down. A checkpoint facility exists to cause the contextCSN to
 216 be written out more frequently if desired.
 217
 218 Note that at startup time, if the provider is unable to read a
 219 {{EX:contextCSN}} from the suffix entry, it will scan the entire
 220 database to determine the value, and this scan may take quite a
 221 long time on a large database. When a {{EX:contextCSN}} value is
 222 read, the database will still be scanned for any {{EX:entryCSN}}
 223 values greater than it, to make sure the {{EX:contextCSN}} value
 224 truly reflects the greatest committed {{EX:entryCSN}} in the database.
 225 On databases which support inequality indexing, setting an eq index
 226 on the {{EX:entryCSN}} attribute and configuring {{contextCSN}}
 227 checkpoints will greatly speed up this scanning step.
 228
 229 If no {{EX:contextCSN}} can be determined by reading and scanning
 230 the database, a new value will be generated. Also, if scanning the
 231 database yielded a greater {{EX:entryCSN}} than was previously
 232 recorded in the suffix entry's {{EX:contextCSN}} attribute, a
 233 checkpoint will be immediately written with the new value.
 234
 235 The consumer also stores its replica state, which is the provider's
 236 {{EX:contextCSN}} received as a synchronization cookie, in the
 237 {{EX:contextCSN}} attribute of the suffix entry.  The replica state
 238 maintained by a consumer server is used as the synchronization state
 239 indicator when it performs subsequent incremental synchronization
 240 with the provider server. It is also used as a provider-side
 241 synchronization state indicator when it functions as a secondary
 242 provider server in a cascading replication configuration.  Since
 243 the consumer and provider state information are maintained in the
 244 same location within their respective databases, any consumer can
 245 be promoted to a provider (and vice versa) without any special
 246 actions.
 247
 248 Because a general search filter can be used in the syncrepl
 249 specification, some entries in the context may be omitted from the
 250 synchronization content.  The syncrepl engine creates a glue entry
 251 to fill in the holes in the replica context if any part of the
 252 replica content is subordinate to the holes. The glue entries will
 253 not be returned in the search result unless {{ManageDsaIT}} control
 254 is provided.
 255
 256 Also as a consequence of the search filter used in the syncrepl
 257 specification, it is possible for a modification to remove an entry
 258 from the replication scope even though the entry has not been deleted
 259 on the provider. Logically the entry must be deleted on the consumer
 260 but in {{refreshOnly}} mode the provider cannot detect and propagate
 261 this change without the use of the session log.
 262
 263
 264 H2: Configuring Syncrepl
 265
 266 Because syncrepl is a consumer-side replication engine, the syncrepl
 267 specification is defined in {{slapd.conf}} (5) of the consumer
 268 server, not in the provider server's configuration file.  The initial
 269 loading of the replica content can be performed either by starting
 270 the syncrepl engine with no synchronization cookie or by populating
 271 the consumer replica by adding an {{TERM:LDIF}} file dumped as a
 272 backup at the provider.
 273
 274 When loading from a backup, it is not required to perform the initial
 275 loading from the up-to-date backup of the provider content. The
 276 syncrepl engine will automatically synchronize the initial consumer
 277 replica to the current provider content. As a result, it is not
 278 required to stop the provider server in order to avoid the replica
 279 inconsistency caused by the updates to the provider content during
 280 the content backup and loading process.
 281
 282 When replicating a large scale directory, especially in a bandwidth
 283 constrained environment, it is advised to load the consumer replica
 284 from a backup instead of performing a full initial load using
 285 syncrepl.
 286
 287
 288 H3: Set up the provider slapd
 289
 290 The provider is implemented as an overlay, so the overlay itself
 291 must first be configured in {{slapd.conf}} (5) before it can be
 292 used. The provider has only two configuration directives, for setting
 293 checkpoints on the {{EX:contextCSN}} and for configuring the session
 294 log.  Because the LDAP Sync search is subject to access control,
 295 proper access control privileges should be set up for the replicated
 296 content.
 297
 298 The {{EX:contextCSN}} checkpoint is configured by the
 299
 300 >       syncprov-checkpoint <ops> <minutes>
 301
 302 directive. Checkpoints are only tested after successful write
 303 operations.  If {{<ops>}} operations or more than {{<minutes>}}
 304 time has passed since the last checkpoint, a new checkpoint is
 305 performed.
 306
 307 The session log is configured by the
 308
 309 >       syncprov-sessionlog <size>
 310
 311 directive, where {{<size>}} is the maximum number of session log
 312 entries the session log can record. When a session log is configured,
 313 it is automatically used for all LDAP Sync searches within the
 314 database.
 315
 316 Note that using the session log requires searching on the {{entryUUID}}
 317 attribute. Setting an eq index on this attribute will greatly benefit
 318 the performance of the session log on the provider.
 319
 320 A more complete example of the {{slapd.conf}} content is thus:
 321
 322 >       database bdb
 323 >       suffix dc=Example,dc=com
 324 >       rootdn dc=Example,dc=com
 325 >       directory /var/ldap/db
 326 >       index objectclass,entryCSN,entryUUID eq
 327 >
 328 >       overlay syncprov
 329 >       syncprov-checkpoint 100 10
 330 >       syncprov-sessionlog 100
 331
 332
 333 H3: Set up the consumer slapd
 334
 335 The syncrepl replication is specified in the database section of
 336 {{slapd.conf}} (5) for the replica context.  The syncrepl engine
 337 is backend independent and the directive can be defined with any
 338 database type.
 339
 340 >       database hdb
 341 >       suffix dc=Example,dc=com
 342 >       rootdn dc=Example,dc=com
 343 >       directory /var/ldap/db
 344 >       index objectclass,entryCSN,entryUUID eq
 345 >
 346 >       syncrepl rid=123
 347 >               provider=ldap://provider.example.com:389
 348 >               type=refreshOnly
 349 >               interval=01:00:00:00
 350 >               searchbase="dc=example,dc=com"
 351 >               filter="(objectClass=organizationalPerson)"
 352 >               scope=sub
 353 >               attrs="cn,sn,ou,telephoneNumber,title,l"
 354 >               schemachecking=off
 355 >               bindmethod=simple
 356 >               binddn="cn=syncuser,dc=example,dc=com"
 357 >               credentials=secret
 358
 359 In this example, the consumer will connect to the provider slapd
 360 at port 389 of {{FILE:ldap://provider.example.com}} to perform a
 361 polling ({{refreshOnly}}) mode of synchronization once a day.  It
 362 will bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple
 363 authentication with password "secret".  Note that the access control
 364 privilege of {{EX:cn=syncuser,dc=example,dc=com}} should be set
 365 appropriately in the provider to retrieve the desired replication
 366 content. Also the search limits must be high enough on the provider
 367 to allow the syncuser to retrieve a complete copy of the requested
 368 content.  The consumer uses the rootdn to write to its database so
 369 it always has full permissions to write all content.
 370
 371 The synchronization search in the above example will search for the
 372 entries whose objectClass is organizationalPerson in the entire
 373 subtree rooted at {{EX:dc=example,dc=com}}. The requested attributes
 374 are {{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
 375 {{EX:title}}, and {{EX:l}}. The schema checking is turned off, so
 376 that the consumer {{slapd}} (8) will not enforce entry schema
 377 checking when it process updates from the provider {{slapd}} (8).
 378
 379 For more detailed information on the syncrepl directive, see the
 380 {{SECT:syncrepl}} section of {{SECT:The slapd Configuration File}}
 381 chapter of this admin guide.
 382
 383
 384 H3: Start the provider and the consumer slapd
 385
 386 The provider {{slapd}} (8) is not required to be restarted.
 387 {{contextCSN}} is automatically generated as needed: it might be
 388 originally contained in the {{TERM:LDIF}} file, generated by
 389 {{slapadd}} (8), generated upon changes in the context, or generated
 390 when the first LDAP Sync search arrives at the provider.  If an
 391 LDIF file is being loaded which did not previously contain the
 392 {{contextCSN}}, the {{-w}} option should be used with {{slapadd}}
 393 (8) to cause it to be generated. This will allow the server to
 394 startup a little quicker the first time it runs.
 395
 396 When starting a consumer {{slapd}} (8), it is possible to provide
 397 a synchronization cookie as the {{-c cookie}} command line option
 398 in order to start the synchronization from a specific state.  The
 399 cookie is a comma separated list of name=value pairs. Currently
 400 supported syncrepl cookie fields are {{csn=<csn>}} and {{rid=<rid>}}.
 401 {{<csn>}} represents the current synchronization state of the
 402 consumer replica.  {{<rid>}} identifies a consumer replica locally
 403 within the consumer server. It is used to relate the cookie to the
 404 syncrepl definition in {{slapd.conf}} (5) which has the matching
 405 replica identifier.  The {{<rid>}} must have no more than 3 decimal
 406 digits.  The command line cookie overrides the synchronization
 407 cookie stored in the consumer replica database.