git.sur5r.net Git - openldap/blob - doc/guide/admin/tuning.sdf

   1 # $OpenLDAP$
   2 # Copyright 1999-2007 The OpenLDAP Foundation, All Rights Reserved.
   3 # COPYING RESTRICTIONS APPLY, see COPYRIGHT.
   4
   5 H1: Tuning
   6
   7 This is perhaps one of the most important chapters in the guide, because if
   8 you have not tuned {{slapd}}(8) correctly or grasped how to design your
   9 directory and environment, you can expect very poor performance.
  10
  11 Reading, understanding and experimenting using the instructions and information
  12 in the following sections, will enable you to fully understand how to tailor
  13 your directory server to your specific requirements.
  14
  15 It should be noted that the following information has been collected over time
  16 from our community based FAQ. So obviously the benefit of this real world experience
  17 and advice should be of great value to the reader.
  18
  19
  20 H2: Performance Factors
  21
  22 Various factors can play a part in how your directory performs on your chosen
  23 hardware and environment. We will attempt to discuss these here.
  24
  25
  26 H3: Memory
  27
  28 Scale your cache to use available memory and increase system memory if you can.
  29
  30 More info here.
  31
  32
  33 H3: Disks
  34
  35 Use fast subsystems. Put each database and logs on separate disks.
  36
  37 Example showing config settings
  38
  39
  40 H3: Network Topology
  41
  42 http://www.openldap.org/faq/data/cache/363.html
  43
  44 Drawing here.
  45
  46
  47 H3: Directory Layout Design
  48
  49 Reference to other sections and good/bad drawing here.
  50
  51
  52 H3: Expected Usage
  53
  54 Discussion.
  55
  56
  57 H2: Indexes
  58
  59 http://www.openldap.org/faq/data/cache/42.html
  60 http://www.connexitor.com/blog/pivot/entry.php?id=103#body
  61 http://groups.google.com/group/comp.mail.sendmail/browse_frm/thread/17c5c0b94ad1fc58/f870758659375718?lnk=gst&q=hyc&rnum=12&hl=en#f870758659375718
  62
  63
  64 H2: Logging
  65
  66 http://www.openldap.org/faq/data/cache/80.html
  67
  68
  69 H2: BDB/HDB Database Caching
  70
  71 We all know what caching is, don't we?
  72
  73 In brief, "A cache is a block of memory for temporary storage of data likely
  74 to be used again" - {{URL:http://en.wikipedia.org/wiki/Cache}}
  75
  76 There are 3 types of caches, BerkeleyDB's own cache, {{slapd}}(8)
  77 entry cache and {{TERM:IDL}} (IDL) cache.
  78
  79
  80 H3: Berkeley DB Cache
  81
  82 BerkeleyDB's own data cache operates on page-sized blocks of raw data.
  83
  84 Note that while the {{TERM:BDB}} cache is just raw chunks of memory and
  85 configured as a memory size, the {{slapd}}(8) entry cache holds parsed entries,
  86 and the size of each entry is variable.
  87
  88 There is also an IDL cache which is used for Index Data Lookups.
  89 If you can fit all of your database into slapd's entry cache, and all of your
  90 index lookups fit in the IDL cache, that will provide the maximum throughput.
  91
  92 If not, but you can fit the entire database into the BDB cache, then you
  93 should do that and shrink the slapd entry cache as appropriate.
  94
  95 Failing that, you should balance the BDB cache against the entry cache.
  96
  97 It is worth noting that it is not absolutely necessary to configure a BerkeleyDB
  98 cache equal in size to your entire database. All that you need is a cache
  99 that's large enough for your "working set."
 100
 101 That means, large enough to hold all of the most frequently accessed data,
 102 plus a few less-frequently accessed items.
 103
 104 ORACLE LINKS HERE
 105
 106 H4: Calculating Cachesize
 107
 108 The back-bdb database lives in two main files, {{F:dn2id.bdb}} and {{F:id2entry.bdb}}.
 109 These are B-tree databases. We have never documented the back-bdb internal
 110 layout before, because it didn't seem like something anyone should have to worry
 111 about, nor was it necessarily cast in stone. But here's how it works today,
 112 in OpenLDAP 2.4.
 113
 114 A B-tree is a balanced tree; it stores data in its leaf nodes and bookkeeping
 115 data in its interior nodes (If you don't know what tree data structures look
 116  like in general, Google for some references, because that's getting far too
 117 elementary for the purposes of this discussion).
 118
 119 For decent performance, you need enough cache memory to contain all the nodes
 120 along the path from the root of the tree down to the particular data item
 121 you're accessing. That's enough cache for a single search. For the general case,
 122 you want enough cache to contain all the internal nodes in the database.
 123
 124 >   db_stat -d
 125
 126 will tell you how many internal pages are present in a database. You should
 127 check this number for both dn2id and id2entry.
 128
 129 Also note that id2entry always uses 16KB per "page", while dn2id uses whatever
 130 the underlying filesystem uses, typically 4 or 8KB. To avoid thrashing the,
 131 your cache must be at least as large as the number of internal pages in both
 132 the dn2id and id2entry databases, plus some extra space to accomodate the actual
 133 leaf data pages.
 134
 135 For example, in my OpenLDAP 2.4 test database, I have an input LDIF file that's
 136 about 360MB. With the back-hdb backend this creates a dn2id.bdb that's 68MB,
 137 and an id2entry that's 800MB. db_stat tells me that dn2id uses 4KB pages, has
 138 433 internal pages, and 6378 leaf pages. The id2entry uses 16KB pages, has 52
 139 internal pages, and 45912 leaf pages. In order to efficiently retrieve any
 140 single entry in this database, the cache should be at least
 141
 142 >   (433+1) * 4KB + (52+1) * 16KB in size: 1736KB + 848KB =~ 2.5MB.
 143
 144 This doesn't take into account other library overhead, so this is even lower
 145 than the barest minimum. The default cache size, when nothing is configured,
 146 is only 256KB.
 147
 148 This 2.5MB number also doesn't take indexing into account. Each indexed attribute
 149 uses another database file of its own, using a Hash structure.
 150
 151 Unlike the B-trees, where you only need to touch one data page to find an entry
 152 of interest, doing an index lookup generally touches multiple keys, and the
 153 point of a hash structure is that the keys are evenly distributed across the
 154 data space. That means there's no convenient compact subset of the database that
 155 you can keep in the cache to insure quick operation, you can pretty much expect
 156 references to be scattered across the whole thing. My strategy here would be to
 157 provide enough cache for at least 50% of all of the hash data.
 158
 159 >   (Number of hash buckets + number of overflow pages + number of duplicate pages) * page size / 2.
 160
 161 The objectClass index for my example database is 5.9MB and uses 3 hash buckets
 162 and 656 duplicate pages. So:
 163
 164 >   ( 3 + 656 ) * 4KB / 2 =~ 1.3MB.
 165
 166 With only this index enabled, I'd figure at least a 4MB cache for this backend.
 167 (Of course you're using a single cache shared among all of the database files,
 168 so the cache pages will most likely get used for something other than what you
 169 accounted for, but this gives you a fighting chance.)
 170
 171 With this 4MB cache I can slapcat this entire database on my 1.3GHz PIII in
 172 1 minute, 40 seconds. With the cache doubled to 8MB, it still takes the same 1:40s.
 173 Once you've got enough cache to fit the B-tree internal pages, increasing it
 174 further won't have any effect until the cache really is large enough to hold
 175 100% of the data pages. I don't have enough free RAM to hold all the 800MB
 176 id2entry data, so 4MB is good enough.
 177
 178 With back-bdb and back-hdb you can use "db_stat -m" to check how well the
 179 database cache is performing.
 180
 181
 182 H3: {{slapd}}(8) Entry Cache
 183
 184 The {{slapd}}(8) entry cache operates on decoded entries. The rationale - entries
 185 in the entry cache can be used directly, giving the fastest response. If an entry
 186 isn't in the entry cache but can be extracted from the BDB page cache, that will
 187 avoid an I/O but it will still require parsing, so this will be slower.
 188
 189 If the entry is in neither cache then BDB will have to flush some of its current
 190 cached pages and bring in the needed pages, resulting in a couple of expensive
 191 I/Os as well as parsing.
 192
 193 As far as balancing the entry cache vs the BDB cache - parsed entries in memory
 194 are generally about twice as large as they are on disk.
 195
 196 As we have already mentioned, not having a proper database cache size will
 197 cause performance issues. These issues are not an indication of corruption
 198 occurring in the database. It is merely the fact that the cache is thrashing
 199 itself that causes performance/response time to slowdown.
 200
 201
 202 MOVE BELOW AROUND:
 203
 204
 205 If you want to setup the cache size, please read:
 206
 207  (Xref) How do I configure the BDB backend?
 208  (Xref) What are the DB_CONFIG configuration directives?
 209  http://www.sleepycat.com/docs/utility/db_recover.html
 210
 211 A default config can be found in the answer:
 212
 213  (Xref) What are the DB_CONFIG configuration directives?
 214
 215 just change the set_lg_dir to point to your .log directory or comment that line.
 216
 217 Quick guide:
 218 - Create a DB_CONFIG file in your ldap home directory (/var/lib/ldap/DB_CONFIG) with the correct "set_cachesize" value
 219 - stop your ldap server and run db_recover -h /var/lib/ldap
 220 - start your ldap server and check the new cache size with:
 221
 222   db_stat -h /var/lib/ldap -m | head -n 2
 223
 224 - this procedure is only needed if you use OpenLDAP 2.2 with the BDB or HDB backends; In OpenLDAP 2.3 DB recovery is performed automatically whenever the DB_CONFIG file is changed or when an unclean shutdown is detected.
 225
 226
 227 --On Tuesday, February 22, 2005 12:15 PM -0500 Dusty Doris <openldap@mail.doris.cc> wrote:
 228
 229     Few questions, if you change the cachesize and idlecachesize entries, do
 230     you have to do anything special aside from restarting slapd, such as run
 231     slapindex or db_recover?
 232
 233
 234     Also, is there any way to tell how much memory these caches are taking up
 235     to make sure they are not set too large?  What happens if you set your
 236     cachesize too large and you don't have enough available memory to store
 237     these?  Will that cause an issue with openldap, or will it just not cache
 238     those entries that would make it exceed its available memory.  Will it
 239     just use some sort of FIFO on those caches?
 240
 241
 242 It will consume the memory resources of your system, and likely cause issues.
 243
 244     Finally, what do most people try to achieve with these values?  Would the
 245     goal be to make these as big as the directory?  So, if I have 400,000 dn's
 246     in my directory, would it be safe to set these at 400000 or would
 247     something like 20,000 be good enough to get a nice performance increase?
 248
 249
 250 I try to cache the most actively used entries. Unless you expect all 400,000 entries of your DB to be accessed regularly, there is no need to cache that many entries. My entry cache is set to 20,000 (out of a little over 400,000 entries).
 251
 252 The idl cache has to do with how many unique result sets of searches you want to store in memory. Setting up this cache will allow your most frequently placed searches to get results much faster, but I doubt you want to try and cache the results of every search that hits your system. ;)
 253
 254 --Quanah
 255
 256
 257 H3: {{TERM:IDL}} Cache
 258
 259
 260 http://www.openldap.org/faq/data/cache/1076.html