git.sur5r.net Git - openldap/blob - doc/guide/admin/syncrepl.sdf

   1 # $OpenLDAP$
   2 # Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
   3 # COPYING RESTRICTIONS APPLY, see COPYRIGHT.
   4
   5 H1: LDAP Sync Replication
   6
   7 The LDAP Sync replication engine, syncrepl for short, is a consumer-side
   8 replication engine that enables the consumer LDAP server to maintain
   9 a shadow copy of a DIT fragment. A syncrepl engine resides at the
  10 consumer-side as one of the {{slapd}} (8) threads. It creates and
  11 maintains a consumer replica by connecting to the replication provider
  12 to perform the initial DIT content load followed either by
  13 periodic content polling or by timely updates upon content changes.
  14
  15 Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for short)
  16 protocol as the replica synchronization protocol.  It provides a stateful
  17 replication which supports both
  18 pull-based and push-based synchronization and does not mandate
  19 the use of a history store.
  20
  21 Syncrepl keeps track of the status of the replication content by
  22 maintaining and exchanging synchronization cookies. Because the
  23 syncrepl consumer and provider maintain their content status,
  24 the consumer can poll the provider content to perform incremental
  25 synchronization by asking for the entries required to make the consumer
  26 replica up-to-date with the provider content. Syncrepl also enables
  27 convenient management of replicas by maintaining replica status.
  28 The consumer replica can be constructed from a consumer-side or a
  29 provider-side backup at any synchronization status. Syncrepl can
  30 automatically resynchronize the consumer replica up-to-date with the
  31 current provider content.
  32
  33 Syncrepl supports both pull-based and
  34 push-based synchronization. In its basic refreshOnly synchronization mode,
  35 the provider uses pull-based synchronization where the consumer servers
  36 need not be tracked and no history information is maintained.
  37 The information required for the provider to process periodic polling
  38 requests is contained in the synchronization cookie of the request itself.
  39 To optimize the pull-based synchronization, syncrepl utilizes the present
  40 phase of the LDAP Sync protocol as well as its delete phase, instead of
  41 falling back on frequent full reloads. To further optimize the pull-based
  42 synchronization, the provider can maintain a per-scope session log
  43 as a history store. In its refreshAndPersist mode of synchronization,
  44 the provider uses a push-based synchronization. The provider keeps
  45 track of the consumer servers that have requested a persistent search
  46 and sends them necessary updates as the provider replication content
  47 gets modified.
  48
  49 With syncrepl, a consumer server can create a replica without changing
  50 the provider's configurations and without restarting the provider server,
  51 if the consumer server has appropriate access privileges for the
  52 DIT fragment to be replicated. The consumer server can stop the
  53 replication also without the need for provider-side changes and restart.
  54
  55 Syncrepl supports both partial and sparse replications.
  56 The shadow DIT fragment is defined by a general
  57 search criteria consisting of base, scope, filter, and attribute list.
  58 The replica content is also subject to the access privileges
  59 of the bind identity of the syncrepl replication connection.
  60
  61
  62 H2: The LDAP Content Synchronization Protocol
  63
  64 The LDAP Sync protocol allows a client to maintain a synchronized copy
  65 of a DIT fragment. The LDAP Sync operation is defined as a set of
  66 controls and other protocol elements which extend the LDAP search
  67 operation. This section introduces the LDAP Content Sync protocol
  68 only briefly. For more information, refer to the Internet Draft
  69 {{The LDAP Content Synchronization Operation
  70 <draft-zeilenga-ldup-sync-05.txt>}}.
  71
  72 The LDAP Sync protocol supports both polling and listening for
  73 changes by defining two respective synchronization operations:
  74 {{refreshOnly}} and {{refreshAndPersist}}.
  75 Polling is implemented by the {{refreshOnly}} operation.
  76 The client copy is synchronized to the server copy at the time of polling.
  77 The server finishes the search operation by returning {{SearchResultDone}}
  78 at the end of the search operation as in the normal search.
  79 The listening is implemented by the {{refreshAndPersist}} operation.
  80 Instead of finishing the search after returning all entries currently
  81 matching the search criteria, the synchronization search remains
  82 persistent in the server. Subsequent updates to the synchronization content
  83 in the server cause additional entry updates to be sent to the client.
  84
  85 The {{refreshOnly}} operation and the refresh stage of the
  86 {{refreshAndPersist}} operation can be performed with
  87 a present phase or a delete phase.
  88
  89 In the present phase, the server sends the client the entries updated
  90 within the search scope since the last synchronization. The server sends
  91 all requested attributes, be it changed or not, of the updated entries.
  92 For each unchanged entry which remains in the scope,
  93 the server sends a present message consisting only of the name of the
  94 entry and the synchronization control representing state present.
  95 The present message does not contain any attributes of the entry.
  96 After the client receives all update and present entries,
  97 it can reliably determine the new client copy by adding the entries
  98 added to the server, by replacing the entries modified at the server,
  99 and by deleting entries in the client copy which have not
 100 been updated nor specified as being present at the server.
 101
 102 The transmission of the updated entries in the delete phase is
 103 the same as in the present phase. The server sends all the requested
 104 attributes of the entries updated within the search scope since the
 105 last synchronization to the client. In the delete phase, however,
 106 the server sends a delete message for each entry deleted from the
 107 search scope, instead of sending present messages.
 108 The delete message consists only of the name of the entry
 109 and the synchronization control representing state delete.
 110 The new client copy can be determined by adding, modifying, and
 111 removing entries according to the synchronization control
 112 attached to the {{SearchResultEntry}} message.
 113
 114 In the case that the LDAP Sync server maintains a history store
 115 and can determine which entries are scoped out of the client
 116 copy since the last synchronization time, the server can use
 117 the delete phase. If the server does not maintain any history store,
 118 cannot determine the scoped-out entries from the history store,
 119 or the history store does not cover the outdated synchronization
 120 state of the client, the server should use the present phase.
 121 The use of the present phase is much more efficient than a full
 122 content reload in terms of the synchronization traffic.
 123 To reduce the synchronization traffic further,
 124 the LDAP Sync protocol also provides several optimizations
 125 such as the transmission of the normalized {{EX:entryUUID}}s and the
 126 transmission of multiple {{EX:entryUUIDs}} in a single
 127 {{syncIdSet}} message.
 128
 129 At the end of the {{refreshOnly}} synchronization,
 130 the server sends a synchronization cookie to the client as a state
 131 indicator of the client copy after the synchronization is completed.
 132 The client will present the received cookie when it requests
 133 the next incremental synchronization to the server.
 134
 135 When {{refreshAndPersist}} synchronization is used,
 136 the server sends a synchronization cookie at the end of the
 137 refresh stage by sending a Sync Info message with TRUE refreshDone.
 138 It also sends a synchronization cookie by attaching it to
 139 {{SearchResultEntry}} generated in the persist stage of the
 140 synchronization search. During the persist stage, the server
 141 can also send a Sync Info message containing the synchronization
 142 cookie at any time the server wants to update the client-side state
 143 indicator.  The server also updates a synchronization indicator
 144 of the client at the end of the persist stage.
 145
 146 In the LDAP Sync protocol, entries are uniquely identified by
 147 the {{EX:entryUUID}} attribute value. It can function as a reliable
 148 identifier of the entry. The DN of the entry, on the other hand,
 149 can be changed over time and hence cannot be considered as the reliable
 150 identifier.  The {{EX:entryUUID}} is attached to each {{SearchResultEntry}}
 151 or {{SearchResultReference}} as a part of the synchronization control.
 152
 153
 154 H2: Syncrepl Details
 155
 156 The syncrepl engine utilizes both the {{refreshOnly}} and the
 157 {{refreshAndPersist}} operations of the LDAP Sync protocol.
 158 If a syncrepl specification is included in a database definition,
 159 {{slapd}} (8) launches a syncrepl engine as a {{slapd}} (8) thread
 160 and schedules its execution. If the {{refreshOnly}} operation is
 161 specified, the syncrepl engine will be rescheduled at the interval
 162 time after a synchronization operation is completed.
 163 If the {{refreshAndPersist}} operation is specified, the engine will
 164 remain active and process the persistent synchronization messages
 165 from the provider.
 166
 167 The syncrepl engine utilizes both the present phase and the
 168 delete phase of the refresh synchronization. It is possible to
 169 configure a per-scope session log in the provider server
 170 which stores the {{EX:entryUUID}}s of a finite
 171 number of entries deleted from a replication content.
 172 Multiple replicas of single provider content share the same
 173 per-scope session log. The syncrepl engine uses the delete phase
 174 if the session log is present and the state of the consumer
 175 server is recent enough that no session log entries are truncated
 176 after the last synchronization of the client.
 177 The syncrepl engine uses the present phase if no session log
 178 is configured for the replication content or if the
 179 consumer replica is too outdated to be covered by the session log.
 180 The current design of the session log store is memory based, so
 181 the information contained in the session log is not persistent
 182 over multiple provider invocations. It is not currently supported
 183 to access the session log store by using LDAP operations. It is
 184 also not currently supported to impose access control to the session log.
 185
 186 As a further optimization, even in the case the synchronization search
 187 is not associated with any session log, no entries will be transmitted
 188 to the consumer server when there has been no update in the replication
 189 context.
 190
 191 The syncrepl engine, which is a consumer-side replication engine,
 192 can work with any backends. The LDAP Sync provider can be configured
 193 as an overlay on any backend, but works best with the {{back-bdb}} or
 194 {{back-hdb}} backend. The provider can not support refreshAndPersist
 195 mode on {{back-ldbm}} due to limits in that backend's locking architecture.
 196
 197 The LDAP Sync provider maintains a {{EX:contextCSN}} for each
 198 database as the current synchronization state indicator of the
 199 provider content.  It is the largest {{EX:entryCSN}} in the provider
 200 context such that no transactions for an entry having
 201 smaller {{EX:entryCSN}} value remains outstanding.
 202 The {{EX:contextCSN}} could not just be set to the largest issued
 203 {{EX:entryCSN}} because {{EX:entryCSN}} is obtained before
 204 a transaction starts and transactions are not committed in the
 205 issue order.
 206
 207 The provider stores the {{EX:contextCSN}} of a context in the
 208 {{EX:contextCSN}} attribute of the context suffix entry. The attribute
 209 is not written to the database after every update operation though;
 210 instead it is maintained primarily in memory. At database start time
 211 the provider reads the last saved {{EX:contextCSN}} into memory and
 212 uses the in-memory copy exclusively thereafter. By default, changes
 213 to the {{EX:contextCSN}} as a result of database updates will not be
 214 written to the database until the server is cleanly shut down. A
 215 checkpoint facility exists to cause the contextCSN to be written
 216 out more frequently if desired.
 217
 218 Note that at startup time, if the
 219 provider is unable to read a {{EX:contextCSN}} from the suffix entry,
 220 it will scan the entire database to determine the value, and this
 221 scan may take quite a long time on a large database. When a {{EX:contextCSN}}
 222 value is read, the database will still be scanned for any {{EX:entryCSN}}
 223 values greater than it, to make sure the {{EX:contextCSN}} value truly
 224 reflects the greatest committed {{EX:entryCSN}} in the database. On
 225 databases which support inequality indexing, setting an eq index
 226 on the {{EX:entryCSN}} attribute will greatly speed up this scanning step.
 227
 228 If no {{EX:contextCSN}} can be determined by reading and scanning the
 229 database, a new value will be generated. Also, if scanning the database
 230 yielded a greater {{EX:entryCSN}} than was previously recorded in the
 231 suffix entry's {{EX:contextCSN}} attribute, a checkpoint will be immediately
 232 written with the new value.
 233
 234 The consumer stores its replica state, which is the provider's
 235 {{EX:contextCSN}} received as a synchronization cookie,
 236 in the {{EX:syncreplCookie}} attribute of the immediate child
 237 of the context suffix whose DN is {{cn=syncrepl<rid>,<suffix>}}
 238 and object class is {{EX:syncConsumerSubentry}}.
 239 The replica state maintained by a consumer server is used as the
 240 synchronization state indicator when it performs subsequent incremental
 241 synchronization with the provider server. It is also used as a
 242 provider-side synchronization state indicator when it functions as
 243 a secondary provider server in a cascading replication configuration.
 244 <rid> is the replica ID uniquely identifying the replica locally in the
 245 syncrepl consumer server. <rid> is an integer which has no more than
 246 three decimal digits.
 247
 248 It is possible to retrieve the
 249 {{EX:syncConsumerSubentry}} by performing an LDAP search with
 250 the respective entry as the base object and with the base scope.
 251
 252 Because a general search filter can be used in the syncrepl specification,
 253 some entries in the context may be omitted from the synchronization content.
 254 The syncrepl engine creates a glue entry to fill in the holes
 255 in the replica context if any part of the replica content is
 256 subordinate to the holes. The glue entries will not be returned
 257 as the search result unless {{ManageDsaIT}} control is provided.
 258
 259 Also as a consequence of the search filter used in the syncrepl
 260 specification, it is possible for a modification to remove an
 261 entry from the replication scope even though the entry has not
 262 been deleted on the provider. Logically the entry must be deleted on the
 263 consumer but in {{refreshOnly}} mode the provider cannot detect
 264 and propagate this change without the use of the session log.
 265
 266 H2: Configuring Syncrepl
 267
 268 Because syncrepl is a consumer-side replication engine, the syncrepl
 269 specification is defined in {{slapd.conf}} (5) of the consumer server,
 270 not in the provider server's configuration file.
 271 The initial loading of the replica content can be performed either
 272 by starting the syncrepl engine with no synchronization cookie
 273 or by populating the consumer replica by adding and demoting an
 274 {{TERM:LDIF}} file dumped as a backup at the provider.
 275 {{slapadd}} (8) supports the replica promotion and demotion.
 276
 277 When loading from a backup, it is not required to perform the initial
 278 loading from the up-to-date backup of the provider content. The syncrepl
 279 engine will automatically synchronize the initial consumer replica to
 280 the current provider content. As a result, it is not required
 281 to stop the provider server in order to avoid the replica inconsistency
 282 caused by the updates to the provider content during the
 283 content backup and loading process.
 284
 285 When replicating a large scale directory, especially in a bandwidth
 286 constrained environment, it is advised to load the consumer replica
 287 from a backup instead of performing a full initial load using syncrepl.
 288
 289 H3: Set up the provider slapd
 290
 291 The provider is implemented as an overlay, so the overlay itself must
 292 first be configured in {{slapd.conf}} (5) before it can be used. The
 293 provider has only two configuration directives, for setting checkpoints
 294 on the {{EX:contextCSN}} and for configuring the session log.
 295 Because the
 296 LDAP Sync search is subject to access control, proper access control
 297 privileges should be set up for the replicated content.
 298
 299 The {{EX:contextCSN}} checkpoint is configured by the
 300
 301 >       syncprov-checkpoint <ops> <minutes>
 302
 303 directive. Checkpoints are tested after successful write operations.
 304 If {{<ops>}} operations or more than {{<minutes>}} time has passed
 305 since the last checkpoint, a new checkpoint is performed.
 306
 307 The session log is configured by the
 308
 309 >       syncprov-sessionlog <sid> <size>
 310
 311 directive, where {{<sid>}} is the ID of the per-scope session log
 312 in the provider server and {{<size>}} is the maximum number of
 313 session log entries the session log can record. {{<sid>}}
 314 is an integer no longer than 3 decimal digits. If the consumer
 315 server sends a synchronization cookie containing {{sid=<sid>}}
 316 where {{<sid>}} matches the session log ID specified in the directive,
 317 the LDAP Sync search is to utilize the session log.
 318
 319 Note that using the session log requires searching on the {{entryUUID}}
 320 attribute. Setting an eq index on this attribute will greatly
 321 benefit the performance of the session log on the provider.
 322
 323 A more complete example of the {{slapd.conf}} content is thus:
 324
 325 >       database bdb
 326 >       suffix dc=Example,dc=com
 327 >       directory /var/ldap/db
 328 >       index objectclass,entryCSN,entryUUID eq
 329 >
 330 >       overlay syncprov
 331 >       syncprov-checkpoint 100 10
 332 >       syncprov-sessionlog 0 100
 333
 334
 335 H3: Set up the consumer slapd
 336
 337 The syncrepl replication is specified in the database section
 338 of {{slapd.conf}} (5) for the replica context.
 339 The syncrepl engine is backend independent and the directive
 340 can be defined with any database type.
 341
 342 >       syncrepl rid=123
 343 >               provider=ldap://provider.example.com:389
 344 >               type=refreshOnly
 345 >               interval=01:00:00:00
 346 >               searchbase="dc=example,dc=com"
 347 >               filter="(objectClass=organizationalPerson)"
 348 >               scope=sub
 349 >               attrs="cn,sn,ou,telephoneNumber,title,l"
 350 >               schemachecking=off
 351 >               updatedn="cn=replica,dc=example,dc=com"
 352 >               bindmethod=simple
 353 >               binddn="cn=syncuser,dc=example,dc=com"
 354 >               credentials=secret
 355
 356 In this example, the consumer will connect to the provider slapd
 357 at port 389 of {{FILE:ldap://provider.example.com}} to perform a
 358 polling ({{refreshOnly}}) mode of synchronization once a day.  It will
 359 bind as {{EX:cn=syncuser,dc=example,dc=com}} using simple authentication
 360 with password "secret".  Note that the access control privilege of
 361 {{EX:cn=syncuser,dc=example,dc=com}} should be set appropriately
 362 in the provider to retrieve the desired replication content.
 363 The consumer will write to its database with the privilege of the
 364 {{EX:cn=replica,dc=example,dc=com}} entry as specified in the
 365 {{EX:updatedn=}} directive. The {{EX:updatedn}} entry should have
 366 write permission to the replica content.
 367
 368 The synchronization search in the above example will search for the
 369 entries whose objectClass is organizationalPerson in the entire subtree
 370 rooted at {{EX:dc=example,dc=com}}. The requested attributes are
 371 {{EX:cn}}, {{EX:sn}}, {{EX:ou}}, {{EX:telephoneNumber}},
 372 {{EX:title}}, and {{EX:l}}. The schema checking is turned off, so
 373 that the consumer {{slapd}} (8) will not enforce entry schema checking
 374 when it process updates from the provider {{slapd}} (8).
 375
 376 For more detailed information on the syncrepl directive,
 377 see the {{SECT:syncrepl}} section of {{SECT:The slapd Configuration File}}
 378 chapter of this admin guide.
 379
 380 H3: Start the provider and the consumer slapd
 381
 382 The provider {{slapd}} (8) is not required to be restarted.
 383 {{contextCSN}} is automatically generated as needed:
 384 it might originally contained in the {{TERM:LDIF}} file,
 385 generated by {{slapadd}} (8), generated upon changes in the context,
 386 or generated when the first LDAP Sync search arrived at the provider.
 387
 388 When starting a consumer {{slapd}} (8), it is possible to provide a
 389 synchronization cookie as the {{-c cookie}} command line option
 390 in order to start the synchronization from a specific state.
 391 The cookie is a comma separated list of name=value pairs. Currently
 392 supported syncrepl cookie fields are {{csn=<csn>}}, {{sid=<sid>}}, and
 393 {{rid=<rid>}}. {{<csn>}} represents the current synchronization state
 394 of the consumer replica. {{<sid>}} is the identity of the per-scope
 395 session log to which this consumer will be associated. {{<rid>}} identifies
 396 a consumer replica locally within the consumer server. It is used to relate
 397 the cookie to the syncrepl definition in {{slapd.conf}} (5) which has
 398 the matching replica identifier.
 399 Both {{<sid>}} and {{<rid>}} have no more than 3 decimal digits.
 400 The command line cookie overrides the synchronization cookie
 401 stored in the consumer replica database.