Merge remote branch 'origin/mdb.master'

[openldap] / doc / guide / admin / intro.sdf
diff --git a/doc/guide/admin/intro.sdf b/doc/guide/admin/intro.sdf

index 1c8f7d281f9bb3aa3910878e1d0f41a2201bf5dc..5655032a8aaa72cadcc18401a56d48f85c59cab4 100644 (file)
--- a/doc/guide/admin/intro.sdf
+++ b/doc/guide/admin/intro.sdf
@@ -1,33 +1,40 @@
  # $OpenLDAP$
-# Copyright 1999-2003, The OpenLDAP Foundation, All Rights Reserved.
+# Copyright 1999-2011 The OpenLDAP Foundation, All Rights Reserved.
  # COPYING RESTRICTIONS APPLY, see COPYRIGHT.
  H1: Introduction to OpenLDAP Directory Services
  
-This document describes how to build, configure, and operate OpenLDAP
-software to provide directory services.  This includes details on
-how to configure and run the stand-alone {{TERM:LDAP}} daemon,
-{{slapd}}(8) and the stand-alone LDAP update replication daemon,
-{{slurpd}}(8). It is intended for newcomers and experienced
-administrators alike.  This section provides a basic introduction
-to directory services and, in particular, the directory services
-provided by {{slapd}}(8).
+This document describes how to build, configure, and operate
+{{PRD:OpenLDAP}} Software to provide directory services.  This
+includes details on how to configure and run the Standalone
+{{TERM:LDAP}} Daemon, {{slapd}}(8).  It is intended for new and
+experienced administrators alike.  This section provides a basic
+introduction to directory services and, in particular, the directory
+services provided by {{slapd}}(8).  This introduction is only
+intended to provide enough information so one might get started
+learning about {{TERM:LDAP}}, {{TERM:X.500}}, and directory services.
  
  
  H2: What is a directory service?
  
-A directory is a specialized database optimized for reading, browsing
-and searching.  Directories tend to contain descriptive, attribute-based
-information and support sophisticated filtering capabilities.
-Directories generally do not support complicated transaction or
-roll-back schemes found in database management systems designed
-for handling high-volume complex updates.  Directory updates are
-typically simple all-or-nothing changes, if they are allowed at
-all.  Directories are tuned to give quick response to high-volume
-lookup or search operations. They may have the ability to replicate
-information widely in order to increase availability and reliability,
-while reducing response time.  When directory information is
-replicated, temporary inconsistencies between the replicas may be
-okay, as long as they get in sync eventually.
+A directory is a specialized database specifically designed for
+searching and browsing, in additional to supporting basic lookup
+and update functions.
+
+Note: A directory is defined by some as merely a database optimized
+for read access.  This definition, at best, is overly simplistic.
+
+Directories tend to contain descriptive, attribute-based information
+and support sophisticated filtering capabilities.  Directories
+generally do not support complicated transaction or roll-back schemes
+found in database management systems designed for handling high-volume
+complex updates.  Directory updates are typically simple all-or-nothing
+changes, if they are allowed at all.  Directories are generally
+tuned to give quick response to high-volume lookup or search
+operations. They may have the ability to replicate information
+widely in order to increase availability and reliability, while
+reducing response time.  When directory information is replicated,
+temporary inconsistencies between the replicas may be okay, as long
+as inconsistencies are resolved in a timely manner.
  
  There are many different ways to provide a directory service.
  Different methods allow different kinds of information to be stored
@@ -41,9 +48,17 @@ services are usually {{distributed}}, meaning that the data they
  contain is spread across many machines, all of which cooperate to
  provide the directory service. Typically a global service defines
  a uniform {{namespace}} which gives the same view of the data no
-matter where you are in relation to the data itself.  The Internet
-{{TERM[expand]DNS}} (DNS) is an example of a globally distributed
-directory service.
+matter where you are in relation to the data itself.
+
+A web directory, such as provided by the {{Open Directory Project}}
+<{{URL:http://dmoz.org}}>, is a good example of a directory service.
+These services catalog web pages and are specifically designed to
+support browsing and searching.
+
+While some consider the Internet {{TERM[expand]DNS}} (DNS) is an
+example of a globally distributed directory service, DNS is not
+browseable nor searchable.  It is more properly described as a
+globally distributed {{lookup}} service.
  
  
  H2: What is LDAP?
@@ -52,11 +67,11 @@ H2: What is LDAP?
  it is a lightweight protocol for accessing directory services,
  specifically {{TERM:X.500}}-based directory services.  LDAP runs
  over {{TERM:TCP}}/{{TERM:IP}} or other connection oriented transfer
-services.  The nitty-gritty details of LDAP are defined in
-{{REF:RFC2251}} "The Lightweight Directory Access Protocol (v3)"
-and other documents comprising the technical specification
-{{REF:RFC3377}}.  This section gives an overview of LDAP from a
-user's perspective.
+services.  LDAP is an {{ORG:IETF}} Standard Track protocol and is
+specified in "Lightweight Directory Access Protocol (LDAP) Technical
+Specification Road Map" {{REF:RFC4510}}.
+
+This section gives an overview of LDAP from a user's perspective.
  
  {{What kind of information can be stored in the directory?}} The
  LDAP information model is based on {{entries}}. An entry is a
@@ -68,8 +83,8 @@ common name, or "{{EX:mail}}" for email address. The syntax of
  values depend on the attribute type.  For example, a {{EX:cn}}
  attribute might contain the value {{EX:Babs Jensen}}.  A {{EX:mail}}
  attribute might contain the value "{{EX:babs@example.com}}". A
-{{EX:jpegPhoto}} attribute would contain a photograph in the JPEG
-(binary) format.
+{{EX:jpegPhoto}} attribute would contain a photograph in the
+{{TERM:JPEG}} (binary) format.
  
  {{How is the information arranged?}} In LDAP, directory entries
  are arranged in a hierarchical tree-like structure.  Traditionally,
@@ -81,7 +96,7 @@ units, people, printers, documents, or just about anything else
  you can think of.  Figure 1.1 shows an example LDAP directory tree
  using traditional naming.
  
-!import "intro_tree.gif"; align="center"; \
+!import "intro_tree.png"; align="center"; \
         title="LDAP directory tree (traditional naming)"
  FT[align="Center"] Figure 1.1: LDAP directory tree (traditional naming)
  
@@ -91,7 +106,7 @@ for directory services to be located using the {{DNS}}.
  Figure 1.2 shows an example LDAP directory tree using domain-based
  naming.
  
-!import "intro_dctree.gif"; align="center"; \
+!import "intro_dctree.png"; align="center"; \
         title="LDAP directory tree (Internet naming)"
  FT[align="Center"] Figure 1.2: LDAP directory tree (Internet naming)
  
@@ -106,9 +121,9 @@ the entry itself (called the {{TERM[expand]RDN}} or RDN) and
  concatenating the names of its ancestor entries. For example, the
  entry for Barbara Jensen in the Internet naming example above has
  an RDN of {{EX:uid=babs}} and a DN of
-{{EX:uid=babs,ou=People,dc=example,dc=com}}. The full DN format
-is described in {{REF:RFC2253}}, "Lightweight Directory Access
-Protocol (v3):  UTF-8 String Representation of Distinguished Names."
+{{EX:uid=babs,ou=People,dc=example,dc=com}}. The full DN format is
+described in {{REF:RFC4514}}, "LDAP: String Representation of
+Distinguished Names."
  
  {{How is the information accessed?}} LDAP defines operations for
  interrogating and updating the directory.  Operations are provided
@@ -132,25 +147,65 @@ be useful to you.
  
  {{How is the information protected from unauthorized access?}} Some
  directory services provide no protection, allowing anyone to see
-the information. LDAP provides a mechanism for a client to
-authenticate, or prove its identity to a directory server, paving
-the way for rich access control to protect the information the
-server contains.  LDAP also supports privacy and integrity security
+the information. LDAP provides a mechanism for a client to authenticate,
+or prove its identity to a directory server, paving the way for
+rich access control to protect the information the server contains.
+LDAP also supports data security (integrity and confidentiality)
  services.
  
  
+H2: When should I use LDAP?
+
+This is a very good question. In general, you should use a Directory
+server when you require data to be centrally managed, stored and accessible via
+standards based methods. 
+
+Some common examples found throughout the industry are, but not limited to:
+
+* Machine Authentication
+* User Authentication
+* User/System Groups
+* Address book
+* Organization Representation
+* Asset Tracking
+* Telephony Information Store
+* User resource management
+* E-mail address lookups
+* Application Configuration store
+* PBX Configuration store
+* etc.....
+
+There are various {{SECT:Distributed Schema Files}} that are standards based, but
+you can always create your own {{SECT:Schema Specification}}.
+
+There are always new ways to use a Directory and apply LDAP principles to address
+certain problems, therefore there is no simple answer to this question.
+
+If in doubt, join the general LDAP forum for non-commercial discussions and 
+information relating to LDAP at: 
+{{URL:http://www.umich.edu/~dirsvcs/ldap/mailinglist.html}} and ask
+
+H2: When should I not use LDAP?
+
+When you start finding yourself bending the directory to do what you require,
+maybe a redesign is needed. Or if you only require one application to use and 
+manipulate your data (for discussion of LDAP vs RDBMS, please read the 
+{{SECT:LDAP vs RDBMS}} section).
+
+It will become obvious when LDAP is the right tool for the job.
+
+
  H2: How does LDAP work?
  
-LDAP directory service is based on a {{client-server}} model. One
-or more LDAP servers contain the data making up the directory
-information tree (DIT).  The client connects to servers and
-asks it a question.  The server responds with an answer and/or 
-with a pointer to where the client can get additional information
-(typically, another LDAP server).  No matter which LDAP server a
-client connects to, it sees the same view of the directory; a name
-presented to one LDAP server references the same entry it would at
-another LDAP server. This is an important feature of a global
-directory service, like LDAP.
+LDAP utilizes a {{client-server model}}. One or more LDAP servers
+contain the data making up the directory information tree ({{TERM:DIT}}).
+The client connects to servers and asks it a question.  The server
+responds with an answer and/or with a pointer to where the client
+can get additional information (typically, another LDAP server).
+No matter which LDAP server a client connects to, it sees the same
+view of the directory; a name presented to one LDAP server references
+the same entry it would at another LDAP server.  This is an important
+feature of a global directory service.
  
  
  H2: What about X.500?
@@ -170,10 +225,10 @@ While LDAP is still used to access X.500 directory service via
  gateways, LDAP is now more commonly directly implemented in X.500
  servers. 
  
-The stand-alone LDAP daemon, or {{slapd}}(8), can be viewed as a
+The Standalone LDAP Daemon, or {{slapd}}(8), can be viewed as a
  {{lightweight}} X.500 directory server.  That is, it does not
-implement the X.500's DAP.  As a {{lightweight directory}} server,
-{{slapd}}(8) implements only a subset of the X.500 models.
+implement the X.500's DAP nor does it support the complete X.500
+models.
  
  If you are already running a X.500 DAP service and you want to
  continue to do so, you can probably stop reading this guide.  This
@@ -183,10 +238,7 @@ X.500 DAP, or have no immediate plans to run X.500 DAP, read on.
  
  It is possible to replicate data from an LDAP directory server to
  a X.500 DAP {{TERM:DSA}}.  This requires an LDAP/DAP gateway.
-OpenLDAP does not provide such a gateway, but our replication daemon
-can be used to replicate to such a gateway.  See the {{SECT:Replication
-with slurpd}} chapter of this document for information regarding
-replication.
+OpenLDAP Software does not include such a gateway.
  
  
  H2: What is the difference between LDAPv2 and LDAPv3?
@@ -194,22 +246,122 @@ H2: What is the difference between LDAPv2 and LDAPv3?
  LDAPv3 was developed in the late 1990's to replace LDAPv2.
  LDAPv3 adds the following features to LDAP:
  
- - Strong Authentication via {{TERM:SASL}}
- - Integrity and Confidentiality Protection via {{TERM:TLS}} (SSL)
- - Internationalization through the use of Unicode
- - Referrals and Continuations
- - Schema Discovery
- - Extensibility (controls, extended operations, and more)
-
-LDAPv2 is historic ({{REF:RFC3494}}).  As most implementations
-(including {{slapd}}(8)) of LDAPv2 do not conform to the LDAPv2
-technical specification, interoperatibility amongst implementations
-claiming LDAPv2 support will be limited.  As LDAPv2 differs
-significantly from LDAPv3, deploying both LDAPv2 and LDAPv3
-simultaneously can be quite problematic.  LDAPv2 should be avoided.
+ * Strong authentication and data security services via {{TERM:SASL}}
+ * Certificate authentication and data security services via {{TERM:TLS}} (SSL)
+ * Internationalization through the use of Unicode
+ * Referrals and Continuations
+ * Schema Discovery
+ * Extensibility (controls, extended operations, and more)
+
+LDAPv2 is historic ({{REF:RFC3494}}).  As most {{so-called}} LDAPv2
+implementations (including {{slapd}}(8)) do not conform to the
+LDAPv2 technical specification, interoperability amongst
+implementations claiming LDAPv2 support is limited.  As LDAPv2
+differs significantly from LDAPv3, deploying both LDAPv2 and LDAPv3
+simultaneously is quite problematic.  LDAPv2 should be avoided.
  LDAPv2 is disabled by default.
  
  
+H2: LDAP vs RDBMS
+
+This question is raised many times, in different forms. The most common, 
+however, is: {{Why doesn't OpenLDAP drop Berkeley DB and use a relational 
+database management system (RDBMS) instead?}} In general, expecting that the 
+sophisticated algorithms implemented by commercial-grade RDBMS would make 
+{{OpenLDAP}} be faster or somehow better and, at the same time, permitting 
+sharing of data with other applications.
+
+The short answer is that use of an embedded database and custom indexing system 
+allows OpenLDAP to provide greater performance and scalability without loss of 
+reliability. OpenLDAP uses Berkeley DB concurrent / transactional 
+database software. This is the same software used by leading commercial 
+directory software.
+
+Now for the long answer. We are all confronted all the time with the choice 
+RDBMSes vs. directories. It is a hard choice and no simple answer exists.
+
+It is tempting to think that having a RDBMS backend to the directory solves all 
+problems. However, it is a pig. This is because the data models are very 
+different. Representing directory data with a relational database is going to 
+require splitting data into multiple tables.
+
+Think for a moment about the person objectclass. Its definition requires 
+attribute types objectclass, sn and cn and allows attribute types userPassword, 
+telephoneNumber, seeAlso and description. All of these attributes are multivalued, 
+so a normalization requires putting each attribute type in a separate table.
+
+Now you have to decide on appropriate keys for those tables. The primary key 
+might be a combination of the DN, but this becomes rather inefficient on most 
+database implementations.
+
+The big problem now is that accessing data from one entry requires seeking on 
+different disk areas. On some applications this may be OK but in many 
+applications performance suffers.
+
+The only attribute types that can be put in the main table entry are those that 
+are mandatory and single-value. You may add also the optional single-valued 
+attributes and set them to NULL or something if not present.
+
+But wait, the entry can have multiple objectclasses and they are organized in 
+an inheritance hierarchy. An entry of objectclass organizationalPerson now has 
+the attributes from person plus a few others and some formerly optional attribute 
+types are now mandatory.
+
+What to do? Should we have different tables for the different objectclasses? 
+This way the person would have an entry on the person table, another on 
+organizationalPerson, etc. Or should we get rid of person and put everything on 
+the second table?
+
+But what do we do with a filter like (cn=*) where cn is an attribute type that 
+appears in many, many objectclasses. Should we search all possible tables for 
+matching entries? Not very attractive.
+
+Once this point is reached, three approaches come to mind. One is to do full 
+normalization so that each attribute type, no matter what, has its own separate 
+table. The simplistic approach where the DN is part of the primary key is 
+extremely wasteful, and calls for an approach where the entry has a unique 
+numeric id that is used instead for the keys and a main table that maps DNs to 
+ids. The approach, anyway, is very inefficient when several attribute types from 
+one or more entries are requested. Such a database, though cumbersomely, 
+can be managed from SQL applications.
+
+The second approach is to put the whole entry as a blob in a table shared by all 
+entries regardless of the objectclass and have additional tables that act as 
+indices for the first table. Index tables are not database indices, but are 
+fully managed by the LDAP server-side implementation. However, the database 
+becomes unusable from SQL. And, thus, a fully fledged database system provides 
+little or no advantage. The full generality of the database is unneeded. 
+Much better to use something light and fast, like Berkeley DB. 
+
+A completely different way to see this is to give up any hopes of implementing 
+the directory data model. In this case, LDAP is used as an access protocol to 
+data that provides only superficially the directory data model. For instance, 
+it may be read only or, where updates are allowed, restrictions are applied, 
+such as making single-value attribute types that would allow for multiple values. 
+Or the impossibility to add new objectclasses to an existing entry or remove 
+one of those present. The restrictions span the range from allowed restrictions 
+(that might be elsewhere the result of access control) to outright violations of 
+the data model. It can be, however, a method to provide LDAP access to preexisting 
+data that is used by other applications. But in the understanding that we don't
+really have a "directory".
+
+Existing commercial LDAP server implementations that use a relational database 
+are either from the first kind or the third. I don't know of any implementation 
+that uses a relational database to do inefficiently what BDB does efficiently.
+For those who are interested in "third way" (exposing EXISTING data from RDBMS 
+as LDAP tree, having some limitations compared to classic LDAP model, but making 
+it possible to interoperate between LDAP and SQL applications):
+
+OpenLDAP includes back-sql - the backend that makes it possible. It uses ODBC + 
+additional metainformation about translating LDAP queries to SQL queries in your 
+RDBMS schema, providing different levels of access - from read-only to full 
+access depending on RDBMS you use, and your schema.
+
+For more information on concept and limitations, see {{slapd-sql}}(5) man page, 
+or the {{SECT: Backends}} section. There are also several examples for several 
+RDBMSes in {{F:back-sql/rdbms_depend/*}} subdirectories. 
+
+
  H2: What is slapd and what can it do?
  
  {{slapd}}(8) is an LDAP directory server that runs on many different
@@ -220,27 +372,31 @@ service, or run a service all by yourself. Some of slapd's more
  interesting features and capabilities include:
  
  {{B:LDAPv3}}: {{slapd}} implements version 3 of {{TERM[expand]LDAP}}.
-{{slapd}} supports LDAP over both IPv4 and IPv6.
+{{slapd}} supports LDAP over both {{TERM:IPv4}} and {{TERM:IPv6}}
+and Unix {{TERM:IPC}}.
  
  {{B:{{TERM[expand]SASL}}}}: {{slapd}} supports strong authentication
-services through the use of SASL.  {{slapd}}'s SASL implementation
-utilizes {{PRD:Cyrus}} {{PRD:SASL}} software which supports a number
-of mechanisms including DIGEST-MD5, EXTERNAL, and GSSAPI.
+and data security (integrity and confidentiality) services through
+the use of SASL.  {{slapd}}'s SASL implementation utilizes {{PRD:Cyrus
+SASL}} software which supports a number of mechanisms including
+{{TERM:DIGEST-MD5}}, {{TERM:EXTERNAL}}, and {{TERM:GSSAPI}}.
  
-{{B:{{TERM[expand]TLS}}}}: {{slapd}} provides privacy and integrity
-protections through the use of TLS (or SSL).  {{slapd}}'s TLS
-implementation utilizes {{PRD:OpenSSL}} software.
+{{B:{{TERM[expand]TLS}}}}: {{slapd}} supports certificate-based
+authentication and data security (integrity and confidentiality)
+services through the use of TLS (or SSL).  {{slapd}}'s TLS
+implementation can utilize {{PRD:OpenSSL}}, {{PRD:GnuTLS}},
+or {{PRD:MozNSS}} software.
  
-{{B:Topology control}}: {{slapd}} allows one to restrict access to
-the server based upon network topology.  This feature utilizes
-{{TCP wrappers}}.
+{{B:Topology control}}: {{slapd}} can be configured to restrict
+access at the socket layer based upon network topology information.
+This feature utilizes {{TCP wrappers}}.
  
  {{B:Access control}}: {{slapd}} provides a rich and powerful access
  control facility, allowing you to control access to the information
  in your database(s). You can control access to entries based on
  LDAP authorization information, {{TERM:IP}} address, domain name
-and other criteria.  {{slapd}} supports both {{static}} and
-{{dynamic}} access control information.
+and other criteria.  {{slapd}} supports both {{static}} and {{dynamic}}
+access control information.
  
  {{B:Internationalization}}: {{slapd}} supports Unicode and language
  tags.
@@ -248,11 +404,11 @@ tags.
  {{B:Choice of database backends}}: {{slapd}} comes with a variety
  of different database backends you can choose from. They include
  {{TERM:BDB}}, a high-performance transactional database backend;
-{{TERM:LDBM}}, a lightweight DBM based backend; {{SHELL}}, a backend
-interface to arbitrary shell scripts; and PASSWD, a simple backend
-interface to the {{passwd}}(5) file.  BDB utilizes {{ORG:Sleepycat}}
-{{PRD:Berkeley DB}}.  LDBM utilizes either {{PRD:Berkeley DB}} or
-{{PRD:GDBM}}.
+{{TERM:HDB}}, a hierarchical high-performance transactional
+backend; {{SHELL}}, a backend interface to arbitrary shell scripts;
+and PASSWD, a simple backend interface to the {{passwd}}(5) file.
+The BDB and HDB backends utilize {{ORG:Oracle}} {{PRD:Berkeley
+DB}}.
  
  {{B:Multiple database instances}}: {{slapd}} can be configured to
  serve multiple databases at the same time. This means that a single
@@ -260,50 +416,38 @@ serve multiple databases at the same time. This means that a single
  portions of the LDAP tree, using the same or different database
  backends.
  
-{{B:Generic modules API}}: If you require even more customization,
+{{B:Generic modules API}}:  If you require even more customization,
  {{slapd}} lets you write your own modules easily. {{slapd}} consists
  of two distinct parts: a front end that handles protocol communication
  with LDAP clients; and modules which handle specific tasks such as
-database operations. Because these two pieces communicate via a
+database operations.  Because these two pieces communicate via a
  well-defined {{TERM:C}} {{TERM:API}}, you can write your own
  customized modules which extend {{slapd}} in numerous ways.  Also,
  a number of {{programmable database}} modules are provided.  These
  allow you to expose external data sources to {{slapd}} using popular
-programming languages ({{PRD:Perl}}, {{shell}}, {{PRD:SQL}}, and
-{{PRD:TCL}}).
+programming languages ({{PRD:Perl}}, {{shell}}, and {{TERM:SQL}}).
  
  {{B:Threads}}: {{slapd}} is threaded for high performance.  A single
-multi-threaded {{slapd}} process handles all incoming requests
-using a pool of threads.  This reduces the amount of system overhead
+multi-threaded {{slapd}} process handles all incoming requests using
+a pool of threads.  This reduces the amount of system overhead
  required while providing high performance.
  
-{{B:Replication}}: {{slapd}} can be configured to maintain replica
-copies of its database. This {{single-master/multiple-slave}}
+{{B:Replication}}: {{slapd}} can be configured to maintain shadow
+copies of directory information.  This {{single-master/multiple-slave}}
  replication scheme is vital in high-volume environments where a
-single {{slapd}} just doesn't provide the necessary availability
-or reliability.  {{slapd}} also includes experimental support for
-{{multi-master}} replication.
+single {{slapd}} installation just doesn't provide the necessary availability
+or reliability.  For extremely demanding environments where a
+single point of failure is not acceptable, {{multi-master}} replication
+is also available.  {{slapd}} includes support for {{LDAP Sync}}-based
+replication.
+
+{{B:Proxy Cache}}: {{slapd}} can be configured as a caching
+LDAP proxy service.
  
  {{B:Configuration}}: {{slapd}} is highly configurable through a
  single configuration file which allows you to change just about
  everything you'd ever want to change.  Configuration options have
-reasonable defaults, making your job much easier.
-
-{{slapd}} also has its limitations, of course.  The main BDB
-backend does not handle range queries or negation queries
-very well.
-
-
-H2: What is slurpd and what can it do?
-
-{{slurpd}}(8) is a daemon that helps {{slapd}} provide replicated
-service. It is responsible for distributing changes made to the
-master {{slapd}} database out to the various {{slapd}} replicas.
-It frees {{slapd}} from having to worry that some replicas might
-be down or unreachable when a change comes through; {{slurpd}}
-handles retrying failed requests automatically.  {{slapd}} and
-{{slurpd}} communicate through a simple text file that is used to
-log changes.
+reasonable defaults, making your job much easier. Configuration can
+also be performed dynamically using LDAP itself, which greatly
+improves manageability.
  
-See the {{SECT:Replication with slurpd}} chapter for information
-about how to configure and run {{slurpd}}(8).