From 34d6b50e45d8b2183825281176a1b47db7224bf6 Mon Sep 17 00:00:00 2001
From: Kurt Zeilenga <kurt@openldap.org>
Date: Tue, 16 Sep 2003 05:16:33 +0000
Subject: [PATCH] Initial proxy cache and syncrepl chapters

---
 doc/guide/admin/master.sdf      |   6 +
 doc/guide/admin/proxycache.sdf  | 133 +++++++++++++++++
 doc/guide/admin/slapdconfig.sdf |  73 +++++++++
 doc/guide/admin/syncrepl.sdf    | 253 ++++++++++++++++++++++++++++++++
 4 files changed, 465 insertions(+)
 create mode 100644 doc/guide/admin/proxycache.sdf
 create mode 100644 doc/guide/admin/syncrepl.sdf
diff --git a/doc/guide/admin/master.sdf b/doc/guide/admin/master.sdf
index 6025f74672..1fd2b6ae6c 100644
--- a/doc/guide/admin/master.sdf
+++ b/doc/guide/admin/master.sdf
@@ -69,6 +69,12 @@ PB:
 !include "replication.sdf"; chapter
 PB:
 
+!include "syncrepl.sdf"; chapter
+PB:
+
+!include "proxycache.sdf"; chapter
+PB:
+
 # Appendices 
 !include "../release/autoconf.sdf"; appendix
 PB:
diff --git a/doc/guide/admin/proxycache.sdf b/doc/guide/admin/proxycache.sdf
new file mode 100644
index 0000000000..5f0798f34f
--- /dev/null
+++ b/doc/guide/admin/proxycache.sdf
@@ -0,0 +1,133 @@
+# $OpenLDAP$
+# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
+# COPYING RESTRICTIONS APPLY, see COPYRIGHT.
+
+H1: The Proxy Cache Engine
+
+LDAP servers typically hold one or more subtrees of a DIT. Replica
+(or shadow) servers hold shadow copies of entries held by one or
+more master servers.  Changes are propagated from the master server
+to replica (slave) servers using LDAP Sync or {{slurpd}}(8). An
+LDAP cache is a special type of replica which holds entries
+corresponding to search filters instead of subtrees.
+
+H2: Overview
+
+The proxy cache extension of slapd handles a search request (query)
+by first determining whether it is contained in any cached search
+filter. Contained requests are answered from the proxy cache's local
+database.
+
+E.g. {{EX:(shoesize>=9)}} is contained in {{EX:(shoesize>=8)}} and
+{{EX:(sn=Richardson)}} is contained in {{EX:(sn=Richards*)}}
+
+Correct matching rules and syntaxes are used while comparing
+assertions for query containment. To simplify the query containment
+problem, a list of cacheable "templates" (defined below) is specified
+at configuration time. A query is cached or answered only if it
+belongs to one of these templates. The entries corresponding to
+cached queries are stored in the proxy cache local database while
+its associated meta information (filter, scope, base, attributes)
+is stored in main memory. Instead of sending a referral for requests
+which are not contained, it acts as a proxy and obtains the result
+by querying one or more target servers. The proxy cache extends the
+meta backend and uses it to connect to target servers.
+
+A template is a prototype for generating LDAP search requests.
+Templates are described by a prototype search filter and a list of
+attributes which are required in queries generated from the template.
+The representation for prototype filter is similar to RFC 2254,
+except that the assertion values are missing. Examples of prototype
+filters are: (sn=),(&(sn=)(givenname=)) which are instantiated by
+search filters (sn=Doe) and (&(sn=Doe)(givenname=John)) respectively.
+
+The cache replacement policy removes the least recently used (LRU)
+query and entries belonging to only that query. Queries are allowed
+a maximum time to live (TTL) in the cache thus providing weak
+consistency. A background thread periodically checks the cache for
+expired queries and removes them.
+
+The Proxy Cache paper
+({{URL:http://www.openldap.org/pub/kapurva/proxycaching.pdf}}) provides
+design/implementation details.
+
+
+H2: Proxy Cache Configuration
+
+The cache configuration specific directives described below must
+appear after the {{EX:"database meta"}} directive and before any other
+{{EX:"database"}} declaration in {{slapd.conf}}(5).
+
+H3: Setting cache parameters
+
+>	cacheparams <lo_thresh> <hi_thresh> <numattrsets> <max_entries> <cc_period>
+
+The directive enables proxy caching and sets general cache parameters.
+Cache replacement is invoked when the cache size crosses the
+<hi_thresh> bytes and continues till the cache size is greater than
+<lo_thresh> bytes. Total number of attributes sets (as specified
+by the attrset directive) is given by <numattrsets>. The entry
+restriction for cacheable queries is specified by <max_entries>.
+Consistency check is performed every <cc_period> duration (specified
+in secs). In each cycle queries with expired TTLs are removed.
+
+H3: Defining attribute sets
+
+> attrset <index> <attrs...>
+
+Used to associate a set of attributes to an index. Each attribute
+set is associated with an index number from 0 to <numattrsets>-1.
+These indices are used by the addtemplate directive to define
+cacheable templates.
+
+H3: Specifying cacheable templates 
+
+> addtemplate <prototype_string> <attrset_index> <TTL>
+
+Specifies a cacheable template and the "time to live" (in sec) <TTL>
+for queries belonging to the template. A template is described by
+its prototype filter string and set of required attributes identified
+by <attrset_index>.
+
+H3: Example
+
+An example {{slapd.conf}}(5) for a caching server which proxies for
+the backend server {{EX:ldap://server.mydomain.com}} and caches
+queries with base object in the {{EX:"dc=example,dc=com"}} subtree
+is described below,
+ 
+>	database 	meta
+>	suffix 		"dc=example,dc=com" 
+>	uri    		ldap://server.mydomain.com/dc=example,dc=com
+>	cacheparams 	100000 150000 1 50 100
+>	attrset 0 mail postaladdress telephonenumber 
+>	addtemplate (sn=) 0 3600
+>	addtemplate (&(sn=)(givenName=)) 0 3600
+>	addtemplate (&(departmentNumber=)(secretary=*)) 0 3600
+    
+A different name space is associated with the local cache database.
+E.g if the local database suffix is {{EX:"dc=example,dc=com,cn=cache"}},
+then following rewriting rules need to be defined to translate
+between master and cache database naming contexts.
+
+>	rewriteEngine 	on
+>	rewriteContext  cacheResult 
+>	rewriteRule	"(.*)dc=example,dc=com" "%1dc=example,dc=com,cn=cache" ":"
+>	rewriteContext  cacheBase
+>	rewriteRule     "(.*)dc=example,dc=com" "%1dc=example,dc=com,cn=cache" ":"
+>	rewriteContext  cacheReturn 
+>	rewriteRule     "(.*)dc=example,dc=com,cn=cache" "%1dc=example,dc=com" ":"
+    
+Finally, the local database for storing cached entries can be declared
+as follows:
+ 
+>	database 	ldbm
+>	suffix 		"dc=example,dc=com,cn=cache" 
+>	#other database specific directives
+
+The proxy cache database instance could be either {{TERM:BDB}} or
+{{TERM:LDBM}}. A script for demonstrating the proxy cache
+({{FILE:test019-proxycaching}}) functionality is provided in the
+tests/scripts directory of the distribution.
+
+
diff --git a/doc/guide/admin/slapdconfig.sdf b/doc/guide/admin/slapdconfig.sdf
index 7d8c124b84..112d9d96ae 100644
--- a/doc/guide/admin/slapdconfig.sdf
+++ b/doc/guide/admin/slapdconfig.sdf
@@ -405,6 +405,79 @@ looks at the suffix line(s) in each database definition in the
 order they appear in the file. Thus, if one database suffix is a
 prefix of another, it must appear after it in the config file.
 
+H4: syncrepl
+
+>	syncrepl id=<replica ID>
+>		provider=ldap[s]://<hostname>[:port]
+>		[updatedn=<dn>]
+>		[binddn=<dn>]
+>		[bindmethod=simple|sasl]
+>		[binddn=<simple DN>]
+>		[credentials=<simple passwd>]
+>		[saslmech=<SASL mech>]
+>		[secprops=<properties>]
+>		[realm=<realm>]
+>		[authcId=<authentication ID>]
+>		[authzId=<authorization ID>]
+>		[searchbase=<base DN>]
+>		[filter=<filter str>]
+>		[attrs=<attr list>]
+>		[scope=sub|one|base]
+>		[schemachecking=on|off]
+>		[type=refreshOnly|refreshAndPersist]
+>		[interval=dd:hh:mm]
+
+This directive specifies an LDAP Sync replication between this
+database and the specified replication provider site. The id=
+parameter identifies the LDAP Sync specification in the database.
+The {{EX:provider=}} parameter specifies a replication provider site as
+an LDAP URI.
+
+The LDAP Sync replication specification is based on the search
+specification which defines the content of the replica. The replica
+consists of the entries matching the search specification. As with
+the normal searches, the search specification consists of
+{{EX:searchbase}}, {{EX:scope}}, {{EX:filter}}, and EX:attrs}}
+parameters.
+
+The LDAP Sync replication has two types of operating modes. In the
+{{EX:refreshOnly}} mode, the next synchronization session is
+rescheduled at the interval time after the current session finishes.
+The default interval is set to one day. In the {{EX:refreshAndPersist}}
+mode, the LDAP Sync search remains persistent in the provider LDAP
+server. Further updates to the provider replica will generate
+searchResultEntry to the consumer.
+
+The schema checking can be enforced at the LDAP Sync consumer site
+by turning on the {{EX:schemachecking}} parameter. The default is off.
+
+The {{EX:binddn=}} parameter gives the DN for the LDAP Sync search
+to bind as to the provider slapd. The content of the replica will
+be subject to the access control privileges of the DN.
+
+The {{EX:bindmethod}} is {{EX:simple}} or {{EX:sasl}}, depending
+on whether simple password-based authentication or SASL authentication
+is to be used when connecting to the provider slapd.
+
+Simple authentication should not be used unless adequate integrity
+and data confidential protections are in place (e.g. TLS or IPSEC).
+Simple authentication requires specification of {{EX:binddn}} and
+{{EX:credentials}} parameters.
+
+SASL authentication is generally recommended. SASL authentication
+requires specification of a mechanism using the {{EX:mech}} parameter.
+Depending on the mechanism, an authentication identity and/or
+credentials can be specified using {{EX:authcid}} and {{EX:credentials}}
+respectively.  The {{EX:authzid}} parameter may be used to specify
+a proxy authorization identity.
+
+The LDAP Sync replication is supported in three native backends:
+back-bdb, back-hdb, and back-ldbm.
+
+See the {{SECT:LDAP Sync Replication}} chapter for more information
+on how to use this directive.
+
+
 H4: updatedn <dn>
 
 This directive is only applicable in a slave slapd. It specifies
diff --git a/doc/guide/admin/syncrepl.sdf b/doc/guide/admin/syncrepl.sdf
new file mode 100644
index 0000000000..f98da94b6a
--- /dev/null
+++ b/doc/guide/admin/syncrepl.sdf
@@ -0,0 +1,253 @@
+# $OpenLDAP$
+# Copyright 2003, The OpenLDAP Foundation, All Rights Reserved.
+# COPYING RESTRICTIONS APPLY, see COPYRIGHT.
+
+H1: LDAP Sync Replication
+
+The LDAP Sync replication engine is designed to function as an
+improved alternative to {{slurpd}}(8).  While the replication with
+{{slurpd}}(8) provides the replication capability for improved capacity,
+availability, and reliability, it has some drawbacks :
+
+^ It is not stateful, hence lacks the resynchronization capability.
+Because there is no representation of replica state in the replication
+with {{slurpd}}(8), it is not possible to provide an efficient mechanism
+to make the slave replica consistent to the master replica once
+they become out of sync. For instance, if the slave database content
+is damaged, the slave replica should be re-primed from the master
+replica again. with a state-based replication, it would be possible
+to recover the slave replica from a local backup. The slave replica,
+then, will be synchronized by calculating and transmitting the diffs
+between the slave replica and the master replica based on their
+states. The LDAP Sync replication is stateful.
+
++ It is history-based, not state-based. The replication with
+{{slurpd}}(8) relies on the history information in the replication log
+file generated by {{slapd}}(8). If a portion of the log file that
+contains updates yet to be synchronized to the slave is truncated
+or damaged, a full reload is required. The state-based replication,
+on the other hand, would not rely on the separate history store.
+In the LDAP Sync replication, every directory entry has its state
+information in the entryCSN operational attribute. The replica
+contents are calculated based on the consumer cookie and the entryCSN
+of the directory entries.
+
++ It is push-based, not pull-based. In the replication with
+{{slurpd}}(8), it is the master who decides when to synchronize the
+replica. The pull-based polling replication is not possible with
+{{slurpd}}(8). For example, in order to make a daily directory backup
+which is an exact image at a time, it is required to make the slave
+replica read-only by stopping {{slurpd}}(8) during backup. After backup,
+{{slurpd}}(8) can be run in an one-shot mode to resynchronize the slave
+replica with the updates during the backup. In a pull-based, polling
+replication, it is guaranteed to be read-only between the two polling
+points. The LDAP Sync replication supports both the push-based
+replication and the pull-based replication.
+
++ It only supports the fractional replication and does not support
+the sparse replication. The LDAP Sync replication supports both the
+fractional and sparse replication. It is possible to use general
+search specification to initiate a synchronization session only for
+the interesting subset of the context.
+
+H2: LDAP Content Sync Protocol Description
+
+The LDAP Sync replication uses the LDAP Content Sync protocol (refer
+to the Internet Draft entitled "The LDAP Content Synchronization
+Operation") for replica synchronization. The LDAP Content Sync
+protocol operation is based on the replica state which is transmitted
+between replicas as the synchronization cookies. There are two
+operating modes : refreshOnly and refreshAndPersist. In both modes,
+a consumer {{slapd}}(8) connects to a provider {{slapd}}(8) with a cookie
+value representing the state of the consumer replica. The non-persistent
+part of the synchronization consists of two phases.
+
+The first is the state-base phase. The entries updated after the
+point in time the consumer cookie represents will be transmitted
+to the consumer. Because the unit of synchronization is entry, all
+the requested attributes will be transmitted even though only some
+of them are changed. For the rest of the entries, the present
+messages consisting only of the name and the synchronization control
+will be sent to the consumer. After the consumer receives all the
+updated and present entries, it can reliably make its replica
+consistent to the provider replica. The consumer will add all the
+newly added entries, replace the entries if updated entries are
+existent, and delete entries in the local replica if they are neither
+updated nor specified as present.
+
+The second is the log-base phase. This phase is incorporated to
+optimize the protocol with respect to the volume of the present
+traffic. If the provider maintains a history store from which the
+content to be synchronized can be reliably calculated, this log-base
+phase follows the state-base phase. In this mode, the actual directory
+update operations such as delete, modify, and add are transmitted.
+There is no need to send present messages in this log-base phase.
+
+If the protocol operates in the refreshOnly mode, the synchronization
+will terminate. The provider will send a synchronization cookie
+which reflects the new state to the consumer. The consumer will
+present the new cookie at the next time it requests a synchronization.
+If the protocol operates in the refreshAndPersist mode, the
+synchronization operation remains persistent in the provider. Every
+updates made to the provider replica will be transmitted to the
+consumer. Cookies can be sent to the consumer at any time by using
+the SyncInfo intermediate response and at the end of the synchronization
+by using the SyncDone control attached to the SearchResultDone
+message.
+
+Entries are uniquely identified by the entryUUID attribute value
+in the LDAP Content Sync protocol. It can role as a reliable entry
+identifier while DN of an entry can change by modrdn operations.
+The entryUUID is attached to each SearchResultEntry or
+SearchResultReference as a part of the Sync State control.
+
+H2: LDAP Sync Replication Details
+
+The LDAP Sync replication uses both the refreshOnly and the
+refreshAndPersist modes of synchronization. If an LDAP Sync replication
+is specified in a database definition, the {{slapd}}(8) schedules an
+execution of the LDAP Sync replication engine. In the refreshOnly
+mode, the engine will be rescheduled at the interval time after a
+replication session ends. In the refreshAndPersist mode, the engine
+will remain active to process the SearchResultEntry messages from
+the provider.
+
+The LDAP Sync replication uses only the state-base synchronization
+phase.  Because {{slapd}}(8) does not currently implement history store
+like changelog or tombstone, it depends only on the state-base
+phase. A Null log-base phase follows the state-base phase.
+
+As an optimization, no entries will be transmitted to a consumer
+if there has been no update in the master replica after the last
+synchronization with the consumer. Even present messages for the
+unchanged entries are not transmitted. The consumer retains its
+replica contents.
+
+H3: entryCSN
+
+The LDAP Sync replication implemented in OpenLDAP stores state
+information to ever entry in the entryCSN attribute. entryCSN of
+an entry is the CSN (change sequence number), which is the refined
+timestamp, at which the entry was updated most lately. The CSN
+consists of three parts : the time, a replica ID, and a change count
+within a single second.
+
+H3: contextCSN
+
+contextCSN represents the current state of the provider replica.
+It is the largest entryCSN of all entries in the context such that
+no transaction having smaller entryCSN value remains outstanding.
+Because the entryCSN value is obtained before transaction start and
+transactions are not committed in the entryCSN order, special care
+needed to be taken to manage the proper contextCSN value in the
+transactional environment. Also, the state of the search result set
+is required to correspond to the contextCSN value returned to the
+consumer as a sync cookie.
+
+contextCSN, the provider replica state, is stored in the
+syncProviderSubentry. The value of the contextCSN is transmitted
+to the consumer replica as a Sync Cookie. The cookie is stored in
+the syncreplCookie attribute of syncConsumerSubentry subentry. The
+consumer will use the stored cookie value to represent its replica
+state when it connects to the provider in the future.
+
+H3: Glue Entry
+
+Because general search filter can be used in the LDAP Sync replication,
+an entry might be created without a parent, if the parent entry was
+filtered out. The LDAP Sync replication engine creates the glue
+entries for such holes in the replica. The glue entries will not
+be returned in response to a search to the consumer {{slapd}}(8) if
+manageDSAit is not set. It will be returned if it is set.
+
+H2: Configuring slapd for LDAP Sync Replication
+
+It is relatively simple to start servicing with a replicated OpenLDAP
+environment with the LDAP Sync replication, compared to the replication
+with {{slurpd}}(8). First, we should configure both the provider and
+the consumer {{slapd}}(8) servers appropriately. Then, start the provider
+slapd instance first, and the consumer slapd instance next.
+Administrative tasks such as database copy and temporal shutdown
+(or read-only demotion) of the provider are not required.
+
+H3: Set up the provider slapd
+
+There is no special slapd.conf(5) directive for the provider {{slapd}}(8).
+Because the LDAP Sync searches are subject to access control, proper
+access control privileges should be set up for the replicated
+content.
+
+When creating a provider database from an ldif file using slapadd(8),
+you must create and update a state indicator of the database context
+up to date. slapadd(8) will store the contextCSN in the
+syncProviderSubentry if it is given the -w flag. It is also possible
+to create the syncProviderSubentry with an appropriate contextCSN
+value by directly including it in the ldif file. If slapadd(8) runs
+without the -w flag, the provided contextCSN will be stored. With
+the -w flag, a new value based on the current time will be stored
+as contextCSN. slapcat(8) can be used to retrieve the directory
+with the contextCSN when it is run with the -m flag.
+
+Only the back-bdb and the back-hdb backends can perform as the LDAP
+Sync replication provider. Back-ldbm currently does not have the
+LDAP Content Sync protocol functionality.
+
+H3: Set up the consumer slapd
+
+The consumer slapd is configured by slapd.conf(5) configuration
+file. For the configuration directives, see syncrepl section of the
+slapd Configuration File chapter. In the configuration file, make
+sure the DN given in the updatedn= directive of the syncrepl
+specification has permission to write to the database. Below is an
+example syncrepl specification at the consumer replica :
+
+>	syncrepl id = 1
+>		provider=ldap://provider.example.com:389
+>		updatedn="cn=replica,dc=example,dc=com"
+>		binddn="cn=syncuser,dc=example,dc=com"
+>		bindmethod=simple
+>		credentials=secret
+>		searchbase="dc=example,dc=com"
+>		filter="(objectClass=organizationalPerson)"
+>		attrs="cn,sn,ou,telephoneNumber,title,l"
+>		schemachecking=on
+>		scope=sub
+>		type=refreshOnly
+>		interval=01:00:00
+
+In this example, the consumer will connect to the provider slapd
+at the port 389 of ldap://provider.example.com to perform a polling
+(refreshOnly) mode of synchronization once a day. It will bind as
+"cn=syncuser,dc=example,dc=com" using simple authentication with
+password "secret". Note that the DN specified by the binddn= directive
+must be existent in the slave slapd's database or be the rootdn.
+Also note that the access control privilege of the DN should be set
+properly to synchronized the desired replica content. It will write
+to the consumer database as "cn=replica,dc=example,dc=com". It
+should have write permission to the database.
+
+The synchronization search in the example will search for entries
+whose objectClass is organizationalPerson in the entire subtree
+under "dc=example,dc=com" search base inclusively. The requested
+attributes are cn, sn, ou, telephoneNumber, title, and l. The schema
+checking is turned on, so that the consumer {{slapd}}(8) will enforce
+entry schema checking when it process updates from the provider
+{{slapd}}(8).
+
+The LDAP Sync replication engine is backend independent. All three
+native backends can perform as the LDAP Sync replication consumer.
+                     
+H3: Start the provider and the consumer slapd
+
+If the currently running provider {{slapd}}(8) already has the
+syncProviderSubentry in its database, it is not required to restart
+the provider slapd. You don't need to restart the provider {{slapd}}(8)
+when you start a replicated LDAP service. When you run a consumer
+{{slapd}}(8), it will immediately perform either the initial full reload
+if cookie is NULL or too out of date, or incremental synchronization
+if effective cookie is provided. In the refreshOnly mode, the next
+synchronization session is scheduled to run interval time after the
+completion of the current session. In the refreshAndPersist mode,
+the synchronization session is open between the consumer and provider.
+The provider will send update message whenever there are updates
+in the provider replica.
-- 
2.39.5