git.sur5r.net Git - openldap/blob - doc/guide/admin/tuning.sdf

   1 # $OpenLDAP$
   2 # Copyright 1999-2007 The OpenLDAP Foundation, All Rights Reserved.
   3 # COPYING RESTRICTIONS APPLY, see COPYRIGHT.
   4
   5 H1: Tuning
   6
   7 This is perhaps one of the most important chapters in the guide, because if
   8 you have not tuned {{slapd}}(8) correctly or grasped how to design your
   9 directory and environment, you can expect very poor performance.
  10
  11 Reading, understanding and experimenting using the instructions and information
  12 in the following sections, will enable you to fully understand how to tailor
  13 your directory server to your specific requirements.
  14
  15 It should be noted that the following information has been collected over time
  16 from our community based FAQ. So obviously the benefit of this real world experience
  17 and advice should be of great value to the reader.
  18
  19
  20 H2: Performance Factors
  21
  22 Various factors can play a part in how your directory performs on your chosen
  23 hardware and environment. We will attempt to discuss these here.
  24
  25
  26 H3: Memory
  27
  28 Scale your cache to use available memory and increase system memory if you can.
  29
  30 More info here.
  31
  32
  33 H3: Disks
  34
  35 Use fast subsystems. Put each database and logs on separate disks.
  36
  37 Example showing config settings
  38
  39
  40 H3: Network Topology
  41
  42 http://www.openldap.org/faq/data/cache/363.html
  43
  44 Drawing here.
  45
  46
  47 H3: Directory Layout Design
  48
  49 Reference to other sections and good/bad drawing here.
  50
  51
  52 H3: Expected Usage
  53
  54 Discussion.
  55
  56
  57 H2: Indexes
  58
  59 H3: Understanding how a search works
  60
  61 If you're searching on a filter that has been indexed, then the search reads
  62 the index and pulls exactly the entries that are referenced by the index.
  63 If the filter term has not been indexed, then the search must read every single
  64  entry in the target scope and test to see if each entry matches the filter.
  65 Obviously indexing can save a lot of work when it's used correctly.
  66
  67 H3: What to index
  68
  69 You should create indices to match the actual filter terms used in
  70 search queries.
  71
  72 >        index cn,sn,givenname,mail eq
  73
  74 Each attribute index can be tuned further by selecting the set of index types to generate. For example, substring and approximate search for organizations (o) may make little sense (and isn't like done very often). And searching for {{userPassword}} likely makes no sense what so ever.
  75
  76 General rule: don't go overboard with indexes. Unused indexes must be maintained and hence can only slow things down.
  77
  78 See {{slapd.conf}}(8) and {{slapdindex}}(8) for more information
  79
  80
  81 H3: Presence indexing
  82
  83 If your client application uses presence filters and if the
  84 target attribute exists on the majority of entries in your target scope, then
  85 all of those entries are going to be read anyway, because they are valid
  86 members of the result set. In a subtree where 100% of the
  87 entries are going to contain the same attributes, the presence index does
  88 absolutely NOTHING to benefit the search, because 100% of the entries match
  89 that presence filter.
  90
  91 So the resource cost of generating the index is a
  92 complete waste of CPU time, disk, and memory. Don't do it unless you know
  93 that it will be used, and that the attribute in question occurs very
  94 infrequently in the target data.
  95
  96 Almost no applications use presence filters in their search queries. Presence
  97 indexing is pointless when the target attribute exists on the majority of
  98 entries in the database. In most LDAP deployments, presence indexing should
  99 not be done, it's just wasted overhead.
 100
 101 See the {{Logging}} section below on what to watch our for if you have a frequently searched
 102 for attribute that is unindexed.
 103
 104
 105 H2: Logging
 106
 107 H3: What log level to use
 108
 109 The default of {{loglevel 256}} is really the best bet. There's a corollary to
 110 this when problems *do* arise, don't try to trace them using syslog.
 111 Use the debug flag instead, and capture slapd's stderr output. syslog is too
 112 slow for debug tracing, and it's inherently lossy - it will throw away messages when it
 113 can't keep up.
 114
 115 Contrary to popular belief, {{loglevel 0}} is not ideal for production as you
 116 won't be able to track when problems first arise.
 117
 118 H3: What to watch out for
 119
 120 The most common message you'll see that you should pay attention to is:
 121
 122 >  "<= bdb_equality_candidates: (foo) index_param failed (18)"
 123
 124 That means that some application tried to use an equality filter ({{foo=<somevalue>}})
 125 and attribute {{foo}} does not have an equality index. If you see a lot of these
 126 messages, you should add the index. If you see one every month or so, it may
 127 be acceptable to ignore it.
 128
 129 The default syslog level is 256 which logs the basic parameters of each
 130 request; it usually produces 1-3 lines of output. On Solaris and systems that
 131 only provide synchronous syslog, you may want to turn it off completely, but
 132 usually you want to leave it enabled so that you'll be able to see index
 133 messages whenever they arise. On Linux you can configure syslogd to run
 134 asynchronously, in which case the performance hit for moderate syslog traffic
 135 pretty much disappears.
 136
 137 H3: Improving throughput
 138
 139 You can improve logging performance on some systems by configuring syslog not
 140 to sync the file system with every write ({{man syslogd/syslog.conf}}). In Linux,
 141 you can prepend the log file name with a "-" in {{syslog.conf}}. For example,
 142 if you are using the default LOCAL4 logging you could try:
 143
 144 >   # LDAP logs
 145 >   LOCAL4.*         -/var/log/ldap
 146
 147 For syslog-ng, add or modify the following line in {{syslog-ng.conf}}:
 148
 149 >   options { sync(n); };
 150
 151 where n is the number of lines which will be buffered before a write.
 152
 153
 154 H2: BDB/HDB Database Caching
 155
 156 We all know what caching is, don't we?
 157
 158 In brief, "A cache is a block of memory for temporary storage of data likely
 159 to be used again" - {{URL:http://en.wikipedia.org/wiki/Cache}}
 160
 161 There are 3 types of caches, BerkeleyDB's own cache, {{slapd}}(8)
 162 entry cache and {{TERM:IDL}} (IDL) cache.
 163
 164
 165 H3: Berkeley DB Cache
 166
 167 BerkeleyDB's own data cache operates on page-sized blocks of raw data.
 168
 169 Note that while the {{TERM:BDB}} cache is just raw chunks of memory and
 170 configured as a memory size, the {{slapd}}(8) entry cache holds parsed entries,
 171 and the size of each entry is variable.
 172
 173 There is also an IDL cache which is used for Index Data Lookups.
 174 If you can fit all of your database into slapd's entry cache, and all of your
 175 index lookups fit in the IDL cache, that will provide the maximum throughput.
 176
 177 If not, but you can fit the entire database into the BDB cache, then you
 178 should do that and shrink the slapd entry cache as appropriate.
 179
 180 Failing that, you should balance the BDB cache against the entry cache.
 181
 182 It is worth noting that it is not absolutely necessary to configure a BerkeleyDB
 183 cache equal in size to your entire database. All that you need is a cache
 184 that's large enough for your "working set."
 185
 186 That means, large enough to hold all of the most frequently accessed data,
 187 plus a few less-frequently accessed items.
 188
 189 ORACLE LINKS HERE
 190
 191 H4: Calculating Cachesize
 192
 193 The back-bdb database lives in two main files, {{F:dn2id.bdb}} and {{F:id2entry.bdb}}.
 194 These are B-tree databases. We have never documented the back-bdb internal
 195 layout before, because it didn't seem like something anyone should have to worry
 196 about, nor was it necessarily cast in stone. But here's how it works today,
 197 in OpenLDAP 2.4.
 198
 199 A B-tree is a balanced tree; it stores data in its leaf nodes and bookkeeping
 200 data in its interior nodes (If you don't know what tree data structures look
 201  like in general, Google for some references, because that's getting far too
 202 elementary for the purposes of this discussion).
 203
 204 For decent performance, you need enough cache memory to contain all the nodes
 205 along the path from the root of the tree down to the particular data item
 206 you're accessing. That's enough cache for a single search. For the general case,
 207 you want enough cache to contain all the internal nodes in the database.
 208
 209 >   db_stat -d
 210
 211 will tell you how many internal pages are present in a database. You should
 212 check this number for both dn2id and id2entry.
 213
 214 Also note that {{id2entry}} always uses 16KB per "page", while {{dn2id}} uses whatever
 215 the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing the,
 216 your cache must be at least as large as the number of internal pages in both
 217 the {{dn2id}} and {{id2entry}} databases, plus some extra space to accommodate the actual
 218 leaf data pages.
 219
 220 For example, in my OpenLDAP 2.4 test database, I have an input LDIF file that's
 221 about 360MB. With the back-hdb backend this creates a {{dn2id.bdb}} that's 68MB,
 222 and an {{id2entry}} that's 800MB. db_stat tells me that {{dn2id}} uses 4KB pages, has
 223 433 internal pages, and 6378 leaf pages. The id2entry uses 16KB pages, has 52
 224 internal pages, and 45912 leaf pages. In order to efficiently retrieve any
 225 single entry in this database, the cache should be at least
 226
 227 >   (433+1) * 4KB + (52+1) * 16KB in size: 1736KB + 848KB =~ 2.5MB.
 228
 229 This doesn't take into account other library overhead, so this is even lower
 230 than the barest minimum. The default cache size, when nothing is configured,
 231 is only 256KB.
 232
 233 This 2.5MB number also doesn't take indexing into account. Each indexed attribute
 234 uses another database file of its own, using a Hash structure.
 235
 236 Unlike the B-trees, where you only need to touch one data page to find an entry
 237 of interest, doing an index lookup generally touches multiple keys, and the
 238 point of a hash structure is that the keys are evenly distributed across the
 239 data space. That means there's no convenient compact subset of the database that
 240 you can keep in the cache to insure quick operation, you can pretty much expect
 241 references to be scattered across the whole thing. My strategy here would be to
 242 provide enough cache for at least 50% of all of the hash data.
 243
 244 >   (Number of hash buckets + number of overflow pages + number of duplicate pages) * page size / 2.
 245
 246 The objectClass index for my example database is 5.9MB and uses 3 hash buckets
 247 and 656 duplicate pages. So:
 248
 249 >   ( 3 + 656 ) * 4KB / 2 =~ 1.3MB.
 250
 251 With only this index enabled, I'd figure at least a 4MB cache for this backend.
 252 (Of course you're using a single cache shared among all of the database files,
 253 so the cache pages will most likely get used for something other than what you
 254 accounted for, but this gives you a fighting chance.)
 255
 256 With this 4MB cache I can slapcat this entire database on my 1.3GHz PIII in
 257 1 minute, 40 seconds. With the cache doubled to 8MB, it still takes the same 1:40s.
 258 Once you've got enough cache to fit the B-tree internal pages, increasing it
 259 further won't have any effect until the cache really is large enough to hold
 260 100% of the data pages. I don't have enough free RAM to hold all the 800MB
 261 id2entry data, so 4MB is good enough.
 262
 263 With back-bdb and back-hdb you can use "db_stat -m" to check how well the
 264 database cache is performing.
 265
 266
 267 H3: {{slapd}}(8) Entry Cache
 268
 269 The {{slapd}}(8) entry cache operates on decoded entries. The rationale - entries
 270 in the entry cache can be used directly, giving the fastest response. If an entry
 271 isn't in the entry cache but can be extracted from the BDB page cache, that will
 272 avoid an I/O but it will still require parsing, so this will be slower.
 273
 274 If the entry is in neither cache then BDB will have to flush some of its current
 275 cached pages and bring in the needed pages, resulting in a couple of expensive
 276 I/Os as well as parsing.
 277
 278 As far as balancing the entry cache vs the BDB cache - parsed entries in memory
 279 are generally about twice as large as they are on disk.
 280
 281 As we have already mentioned, not having a proper database cache size will
 282 cause performance issues. These issues are not an indication of corruption
 283 occurring in the database. It is merely the fact that the cache is thrashing
 284 itself that causes performance/response time to slowdown.
 285
 286
 287 MOVE BELOW AROUND:
 288
 289
 290 If you want to setup the cache size, please read:
 291
 292  (Xref) How do I configure the BDB backend?
 293  (Xref) What are the DB_CONFIG configuration directives?
 294  http://www.sleepycat.com/docs/utility/db_recover.html
 295
 296 A default config can be found in the answer:
 297
 298  (Xref) What are the DB_CONFIG configuration directives?
 299
 300 just change the set_lg_dir to point to your .log directory or comment that line.
 301
 302 Quick guide:
 303 * Create a DB_CONFIG file in your ldap home directory (/var/lib/ldap/DB_CONFIG) with the correct "set_cachesize" value
 304 * stop your ldap server and run db_recover -h /var/lib/ldap
 305 * start your ldap server and check the new cache size with:
 306
 307   db_stat -h /var/lib/ldap -m | head -n 2
 308
 309 * this procedure is only needed if you use OpenLDAP 2.2 with the BDB or HDB backends; In OpenLDAP 2.3 DB recovery is performed automatically whenever the DB_CONFIG file is changed or when an unclean shutdown is detected.
 310
 311
 312 --On Tuesday, February 22, 2005 12:15 PM -0500 Dusty Doris <openldap@mail.doris.cc> wrote:
 313
 314     Few questions, if you change the cachesize and idlecachesize entries, do
 315     you have to do anything special aside from restarting slapd, such as run
 316     slapindex or db_recover?
 317
 318
 319     Also, is there any way to tell how much memory these caches are taking up
 320     to make sure they are not set too large?  What happens if you set your
 321     cachesize too large and you don't have enough available memory to store
 322     these?  Will that cause an issue with openldap, or will it just not cache
 323     those entries that would make it exceed its available memory.  Will it
 324     just use some sort of FIFO on those caches?
 325
 326
 327 It will consume the memory resources of your system, and likely cause issues.
 328
 329     Finally, what do most people try to achieve with these values?  Would the
 330     goal be to make these as big as the directory?  So, if I have 400,000 dn's
 331     in my directory, would it be safe to set these at 400000 or would
 332     something like 20,000 be good enough to get a nice performance increase?
 333
 334
 335 I try to cache the most actively used entries. Unless you expect all 400,000 entries of your DB to be accessed regularly, there is no need to cache that many entries. My entry cache is set to 20,000 (out of a little over 400,000 entries).
 336
 337 The idlcache has to do with how many unique result sets of searches you want to store in memory. Setting up this cache will allow your most frequently placed searches to get results much faster, but I doubt you want to try and cache the results of every search that hits your system. ;)
 338
 339 --Quanah
 340
 341
 342 H3: {{TERM:IDL}} Cache
 343
 344
 345 http://www.openldap.org/faq/data/cache/1076.html