From 34d6b50e45d8b2183825281176a1b47db7224bf6 Mon Sep 17 00:00:00 2001 From: Kurt Zeilenga Date: Tue, 16 Sep 2003 05:16:33 +0000 Subject: [PATCH] Initial proxy cache and syncrepl chapters --- doc/guide/admin/master.sdf | 6 + doc/guide/admin/proxycache.sdf | 133 +++++++++++++++++ doc/guide/admin/slapdconfig.sdf | 73 +++++++++ doc/guide/admin/syncrepl.sdf | 253 ++++++++++++++++++++++++++++++++ 4 files changed, 465 insertions(+) create mode 100644 doc/guide/admin/proxycache.sdf create mode 100644 doc/guide/admin/syncrepl.sdf diff --git a/doc/guide/admin/master.sdf b/doc/guide/admin/master.sdf index 6025f74672..1fd2b6ae6c 100644 --- a/doc/guide/admin/master.sdf +++ b/doc/guide/admin/master.sdf @@ -69,6 +69,12 @@ PB: !include "replication.sdf"; chapter PB: +!include "syncrepl.sdf"; chapter +PB: + +!include "proxycache.sdf"; chapter +PB: + # Appendices !include "../release/autoconf.sdf"; appendix PB: diff --git a/doc/guide/admin/proxycache.sdf b/doc/guide/admin/proxycache.sdf new file mode 100644 index 0000000000..5f0798f34f --- /dev/null +++ b/doc/guide/admin/proxycache.sdf @@ -0,0 +1,133 @@ +# $OpenLDAP$ +# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved. +# COPYING RESTRICTIONS APPLY, see COPYRIGHT. + +H1: The Proxy Cache Engine + +LDAP servers typically hold one or more subtrees of a DIT. Replica +(or shadow) servers hold shadow copies of entries held by one or +more master servers. Changes are propagated from the master server +to replica (slave) servers using LDAP Sync or {{slurpd}}(8). An +LDAP cache is a special type of replica which holds entries +corresponding to search filters instead of subtrees. + +H2: Overview + +The proxy cache extension of slapd handles a search request (query) +by first determining whether it is contained in any cached search +filter. Contained requests are answered from the proxy cache's local +database. + +E.g. {{EX:(shoesize>=9)}} is contained in {{EX:(shoesize>=8)}} and +{{EX:(sn=Richardson)}} is contained in {{EX:(sn=Richards*)}} + +Correct matching rules and syntaxes are used while comparing +assertions for query containment. To simplify the query containment +problem, a list of cacheable "templates" (defined below) is specified +at configuration time. A query is cached or answered only if it +belongs to one of these templates. The entries corresponding to +cached queries are stored in the proxy cache local database while +its associated meta information (filter, scope, base, attributes) +is stored in main memory. Instead of sending a referral for requests +which are not contained, it acts as a proxy and obtains the result +by querying one or more target servers. The proxy cache extends the +meta backend and uses it to connect to target servers. + +A template is a prototype for generating LDAP search requests. +Templates are described by a prototype search filter and a list of +attributes which are required in queries generated from the template. +The representation for prototype filter is similar to RFC 2254, +except that the assertion values are missing. Examples of prototype +filters are: (sn=),(&(sn=)(givenname=)) which are instantiated by +search filters (sn=Doe) and (&(sn=Doe)(givenname=John)) respectively. + +The cache replacement policy removes the least recently used (LRU) +query and entries belonging to only that query. Queries are allowed +a maximum time to live (TTL) in the cache thus providing weak +consistency. A background thread periodically checks the cache for +expired queries and removes them. + +The Proxy Cache paper +({{URL:http://www.openldap.org/pub/kapurva/proxycaching.pdf}}) provides +design/implementation details. + + +H2: Proxy Cache Configuration + +The cache configuration specific directives described below must +appear after the {{EX:"database meta"}} directive and before any other +{{EX:"database"}} declaration in {{slapd.conf}}(5). + +H3: Setting cache parameters + +> cacheparams + +The directive enables proxy caching and sets general cache parameters. +Cache replacement is invoked when the cache size crosses the + bytes and continues till the cache size is greater than + bytes. Total number of attributes sets (as specified +by the attrset directive) is given by . The entry +restriction for cacheable queries is specified by . +Consistency check is performed every duration (specified +in secs). In each cycle queries with expired TTLs are removed. + +H3: Defining attribute sets + +> attrset + +Used to associate a set of attributes to an index. Each attribute +set is associated with an index number from 0 to -1. +These indices are used by the addtemplate directive to define +cacheable templates. + +H3: Specifying cacheable templates + +> addtemplate + +Specifies a cacheable template and the "time to live" (in sec) +for queries belonging to the template. A template is described by +its prototype filter string and set of required attributes identified +by . + +H3: Example + +An example {{slapd.conf}}(5) for a caching server which proxies for +the backend server {{EX:ldap://server.mydomain.com}} and caches +queries with base object in the {{EX:"dc=example,dc=com"}} subtree +is described below, + +> database meta +> suffix "dc=example,dc=com" +> uri ldap://server.mydomain.com/dc=example,dc=com +> cacheparams 100000 150000 1 50 100 +> attrset 0 mail postaladdress telephonenumber +> addtemplate (sn=) 0 3600 +> addtemplate (&(sn=)(givenName=)) 0 3600 +> addtemplate (&(departmentNumber=)(secretary=*)) 0 3600 + +A different name space is associated with the local cache database. +E.g if the local database suffix is {{EX:"dc=example,dc=com,cn=cache"}}, +then following rewriting rules need to be defined to translate +between master and cache database naming contexts. + +> rewriteEngine on +> rewriteContext cacheResult +> rewriteRule "(.*)dc=example,dc=com" "%1dc=example,dc=com,cn=cache" ":" +> rewriteContext cacheBase +> rewriteRule "(.*)dc=example,dc=com" "%1dc=example,dc=com,cn=cache" ":" +> rewriteContext cacheReturn +> rewriteRule "(.*)dc=example,dc=com,cn=cache" "%1dc=example,dc=com" ":" + +Finally, the local database for storing cached entries can be declared +as follows: + +> database ldbm +> suffix "dc=example,dc=com,cn=cache" +> #other database specific directives + +The proxy cache database instance could be either {{TERM:BDB}} or +{{TERM:LDBM}}. A script for demonstrating the proxy cache +({{FILE:test019-proxycaching}}) functionality is provided in the +tests/scripts directory of the distribution. + + diff --git a/doc/guide/admin/slapdconfig.sdf b/doc/guide/admin/slapdconfig.sdf index 7d8c124b84..112d9d96ae 100644 --- a/doc/guide/admin/slapdconfig.sdf +++ b/doc/guide/admin/slapdconfig.sdf @@ -405,6 +405,79 @@ looks at the suffix line(s) in each database definition in the order they appear in the file. Thus, if one database suffix is a prefix of another, it must appear after it in the config file. +H4: syncrepl + +> syncrepl id= +> provider=ldap[s]://[:port] +> [updatedn=] +> [binddn=] +> [bindmethod=simple|sasl] +> [binddn=] +> [credentials=] +> [saslmech=] +> [secprops=] +> [realm=] +> [authcId=] +> [authzId=] +> [searchbase=] +> [filter=] +> [attrs=] +> [scope=sub|one|base] +> [schemachecking=on|off] +> [type=refreshOnly|refreshAndPersist] +> [interval=dd:hh:mm] + +This directive specifies an LDAP Sync replication between this +database and the specified replication provider site. The id= +parameter identifies the LDAP Sync specification in the database. +The {{EX:provider=}} parameter specifies a replication provider site as +an LDAP URI. + +The LDAP Sync replication specification is based on the search +specification which defines the content of the replica. The replica +consists of the entries matching the search specification. As with +the normal searches, the search specification consists of +{{EX:searchbase}}, {{EX:scope}}, {{EX:filter}}, and EX:attrs}} +parameters. + +The LDAP Sync replication has two types of operating modes. In the +{{EX:refreshOnly}} mode, the next synchronization session is +rescheduled at the interval time after the current session finishes. +The default interval is set to one day. In the {{EX:refreshAndPersist}} +mode, the LDAP Sync search remains persistent in the provider LDAP +server. Further updates to the provider replica will generate +searchResultEntry to the consumer. + +The schema checking can be enforced at the LDAP Sync consumer site +by turning on the {{EX:schemachecking}} parameter. The default is off. + +The {{EX:binddn=}} parameter gives the DN for the LDAP Sync search +to bind as to the provider slapd. The content of the replica will +be subject to the access control privileges of the DN. + +The {{EX:bindmethod}} is {{EX:simple}} or {{EX:sasl}}, depending +on whether simple password-based authentication or SASL authentication +is to be used when connecting to the provider slapd. + +Simple authentication should not be used unless adequate integrity +and data confidential protections are in place (e.g. TLS or IPSEC). +Simple authentication requires specification of {{EX:binddn}} and +{{EX:credentials}} parameters. + +SASL authentication is generally recommended. SASL authentication +requires specification of a mechanism using the {{EX:mech}} parameter. +Depending on the mechanism, an authentication identity and/or +credentials can be specified using {{EX:authcid}} and {{EX:credentials}} +respectively. The {{EX:authzid}} parameter may be used to specify +a proxy authorization identity. + +The LDAP Sync replication is supported in three native backends: +back-bdb, back-hdb, and back-ldbm. + +See the {{SECT:LDAP Sync Replication}} chapter for more information +on how to use this directive. + + H4: updatedn This directive is only applicable in a slave slapd. It specifies diff --git a/doc/guide/admin/syncrepl.sdf b/doc/guide/admin/syncrepl.sdf new file mode 100644 index 0000000000..f98da94b6a --- /dev/null +++ b/doc/guide/admin/syncrepl.sdf @@ -0,0 +1,253 @@ +# $OpenLDAP$ +# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved. +# COPYING RESTRICTIONS APPLY, see COPYRIGHT. + +H1: LDAP Sync Replication + +The LDAP Sync replication engine is designed to function as an +improved alternative to {{slurpd}}(8). While the replication with +{{slurpd}}(8) provides the replication capability for improved capacity, +availability, and reliability, it has some drawbacks : + +^ It is not stateful, hence lacks the resynchronization capability. +Because there is no representation of replica state in the replication +with {{slurpd}}(8), it is not possible to provide an efficient mechanism +to make the slave replica consistent to the master replica once +they become out of sync. For instance, if the slave database content +is damaged, the slave replica should be re-primed from the master +replica again. with a state-based replication, it would be possible +to recover the slave replica from a local backup. The slave replica, +then, will be synchronized by calculating and transmitting the diffs +between the slave replica and the master replica based on their +states. The LDAP Sync replication is stateful. + ++ It is history-based, not state-based. The replication with +{{slurpd}}(8) relies on the history information in the replication log +file generated by {{slapd}}(8). If a portion of the log file that +contains updates yet to be synchronized to the slave is truncated +or damaged, a full reload is required. The state-based replication, +on the other hand, would not rely on the separate history store. +In the LDAP Sync replication, every directory entry has its state +information in the entryCSN operational attribute. The replica +contents are calculated based on the consumer cookie and the entryCSN +of the directory entries. + ++ It is push-based, not pull-based. In the replication with +{{slurpd}}(8), it is the master who decides when to synchronize the +replica. The pull-based polling replication is not possible with +{{slurpd}}(8). For example, in order to make a daily directory backup +which is an exact image at a time, it is required to make the slave +replica read-only by stopping {{slurpd}}(8) during backup. After backup, +{{slurpd}}(8) can be run in an one-shot mode to resynchronize the slave +replica with the updates during the backup. In a pull-based, polling +replication, it is guaranteed to be read-only between the two polling +points. The LDAP Sync replication supports both the push-based +replication and the pull-based replication. + ++ It only supports the fractional replication and does not support +the sparse replication. The LDAP Sync replication supports both the +fractional and sparse replication. It is possible to use general +search specification to initiate a synchronization session only for +the interesting subset of the context. + +H2: LDAP Content Sync Protocol Description + +The LDAP Sync replication uses the LDAP Content Sync protocol (refer +to the Internet Draft entitled "The LDAP Content Synchronization +Operation") for replica synchronization. The LDAP Content Sync +protocol operation is based on the replica state which is transmitted +between replicas as the synchronization cookies. There are two +operating modes : refreshOnly and refreshAndPersist. In both modes, +a consumer {{slapd}}(8) connects to a provider {{slapd}}(8) with a cookie +value representing the state of the consumer replica. The non-persistent +part of the synchronization consists of two phases. + +The first is the state-base phase. The entries updated after the +point in time the consumer cookie represents will be transmitted +to the consumer. Because the unit of synchronization is entry, all +the requested attributes will be transmitted even though only some +of them are changed. For the rest of the entries, the present +messages consisting only of the name and the synchronization control +will be sent to the consumer. After the consumer receives all the +updated and present entries, it can reliably make its replica +consistent to the provider replica. The consumer will add all the +newly added entries, replace the entries if updated entries are +existent, and delete entries in the local replica if they are neither +updated nor specified as present. + +The second is the log-base phase. This phase is incorporated to +optimize the protocol with respect to the volume of the present +traffic. If the provider maintains a history store from which the +content to be synchronized can be reliably calculated, this log-base +phase follows the state-base phase. In this mode, the actual directory +update operations such as delete, modify, and add are transmitted. +There is no need to send present messages in this log-base phase. + +If the protocol operates in the refreshOnly mode, the synchronization +will terminate. The provider will send a synchronization cookie +which reflects the new state to the consumer. The consumer will +present the new cookie at the next time it requests a synchronization. +If the protocol operates in the refreshAndPersist mode, the +synchronization operation remains persistent in the provider. Every +updates made to the provider replica will be transmitted to the +consumer. Cookies can be sent to the consumer at any time by using +the SyncInfo intermediate response and at the end of the synchronization +by using the SyncDone control attached to the SearchResultDone +message. + +Entries are uniquely identified by the entryUUID attribute value +in the LDAP Content Sync protocol. It can role as a reliable entry +identifier while DN of an entry can change by modrdn operations. +The entryUUID is attached to each SearchResultEntry or +SearchResultReference as a part of the Sync State control. + +H2: LDAP Sync Replication Details + +The LDAP Sync replication uses both the refreshOnly and the +refreshAndPersist modes of synchronization. If an LDAP Sync replication +is specified in a database definition, the {{slapd}}(8) schedules an +execution of the LDAP Sync replication engine. In the refreshOnly +mode, the engine will be rescheduled at the interval time after a +replication session ends. In the refreshAndPersist mode, the engine +will remain active to process the SearchResultEntry messages from +the provider. + +The LDAP Sync replication uses only the state-base synchronization +phase. Because {{slapd}}(8) does not currently implement history store +like changelog or tombstone, it depends only on the state-base +phase. A Null log-base phase follows the state-base phase. + +As an optimization, no entries will be transmitted to a consumer +if there has been no update in the master replica after the last +synchronization with the consumer. Even present messages for the +unchanged entries are not transmitted. The consumer retains its +replica contents. + +H3: entryCSN + +The LDAP Sync replication implemented in OpenLDAP stores state +information to ever entry in the entryCSN attribute. entryCSN of +an entry is the CSN (change sequence number), which is the refined +timestamp, at which the entry was updated most lately. The CSN +consists of three parts : the time, a replica ID, and a change count +within a single second. + +H3: contextCSN + +contextCSN represents the current state of the provider replica. +It is the largest entryCSN of all entries in the context such that +no transaction having smaller entryCSN value remains outstanding. +Because the entryCSN value is obtained before transaction start and +transactions are not committed in the entryCSN order, special care +needed to be taken to manage the proper contextCSN value in the +transactional environment. Also, the state of the search result set +is required to correspond to the contextCSN value returned to the +consumer as a sync cookie. + +contextCSN, the provider replica state, is stored in the +syncProviderSubentry. The value of the contextCSN is transmitted +to the consumer replica as a Sync Cookie. The cookie is stored in +the syncreplCookie attribute of syncConsumerSubentry subentry. The +consumer will use the stored cookie value to represent its replica +state when it connects to the provider in the future. + +H3: Glue Entry + +Because general search filter can be used in the LDAP Sync replication, +an entry might be created without a parent, if the parent entry was +filtered out. The LDAP Sync replication engine creates the glue +entries for such holes in the replica. The glue entries will not +be returned in response to a search to the consumer {{slapd}}(8) if +manageDSAit is not set. It will be returned if it is set. + +H2: Configuring slapd for LDAP Sync Replication + +It is relatively simple to start servicing with a replicated OpenLDAP +environment with the LDAP Sync replication, compared to the replication +with {{slurpd}}(8). First, we should configure both the provider and +the consumer {{slapd}}(8) servers appropriately. Then, start the provider +slapd instance first, and the consumer slapd instance next. +Administrative tasks such as database copy and temporal shutdown +(or read-only demotion) of the provider are not required. + +H3: Set up the provider slapd + +There is no special slapd.conf(5) directive for the provider {{slapd}}(8). +Because the LDAP Sync searches are subject to access control, proper +access control privileges should be set up for the replicated +content. + +When creating a provider database from an ldif file using slapadd(8), +you must create and update a state indicator of the database context +up to date. slapadd(8) will store the contextCSN in the +syncProviderSubentry if it is given the -w flag. It is also possible +to create the syncProviderSubentry with an appropriate contextCSN +value by directly including it in the ldif file. If slapadd(8) runs +without the -w flag, the provided contextCSN will be stored. With +the -w flag, a new value based on the current time will be stored +as contextCSN. slapcat(8) can be used to retrieve the directory +with the contextCSN when it is run with the -m flag. + +Only the back-bdb and the back-hdb backends can perform as the LDAP +Sync replication provider. Back-ldbm currently does not have the +LDAP Content Sync protocol functionality. + +H3: Set up the consumer slapd + +The consumer slapd is configured by slapd.conf(5) configuration +file. For the configuration directives, see syncrepl section of the +slapd Configuration File chapter. In the configuration file, make +sure the DN given in the updatedn= directive of the syncrepl +specification has permission to write to the database. Below is an +example syncrepl specification at the consumer replica : + +> syncrepl id = 1 +> provider=ldap://provider.example.com:389 +> updatedn="cn=replica,dc=example,dc=com" +> binddn="cn=syncuser,dc=example,dc=com" +> bindmethod=simple +> credentials=secret +> searchbase="dc=example,dc=com" +> filter="(objectClass=organizationalPerson)" +> attrs="cn,sn,ou,telephoneNumber,title,l" +> schemachecking=on +> scope=sub +> type=refreshOnly +> interval=01:00:00 + +In this example, the consumer will connect to the provider slapd +at the port 389 of ldap://provider.example.com to perform a polling +(refreshOnly) mode of synchronization once a day. It will bind as +"cn=syncuser,dc=example,dc=com" using simple authentication with +password "secret". Note that the DN specified by the binddn= directive +must be existent in the slave slapd's database or be the rootdn. +Also note that the access control privilege of the DN should be set +properly to synchronized the desired replica content. It will write +to the consumer database as "cn=replica,dc=example,dc=com". It +should have write permission to the database. + +The synchronization search in the example will search for entries +whose objectClass is organizationalPerson in the entire subtree +under "dc=example,dc=com" search base inclusively. The requested +attributes are cn, sn, ou, telephoneNumber, title, and l. The schema +checking is turned on, so that the consumer {{slapd}}(8) will enforce +entry schema checking when it process updates from the provider +{{slapd}}(8). + +The LDAP Sync replication engine is backend independent. All three +native backends can perform as the LDAP Sync replication consumer. + +H3: Start the provider and the consumer slapd + +If the currently running provider {{slapd}}(8) already has the +syncProviderSubentry in its database, it is not required to restart +the provider slapd. You don't need to restart the provider {{slapd}}(8) +when you start a replicated LDAP service. When you run a consumer +{{slapd}}(8), it will immediately perform either the initial full reload +if cookie is NULL or too out of date, or incremental synchronization +if effective cookie is provided. In the refreshOnly mode, the next +synchronization session is scheduled to run interval time after the +completion of the current session. In the refreshAndPersist mode, +the synchronization session is open between the consumer and provider. +The provider will send update message whenever there are updates +in the provider replica. -- 2.39.5