1.. _slub: 2 3========================== 4Short users guide for SLUB 5========================== 6 7The basic philosophy of SLUB is very different from SLAB. SLAB 8requires rebuilding the kernel to activate debug options for all 9slab caches. SLUB always includes full debugging but it is off by default. 10SLUB can enable debugging only for selected slabs in order to avoid 11an impact on overall system performance which may make a bug more 12difficult to find. 13 14In order to switch debugging on one can add an option ``slub_debug`` 15to the kernel command line. That will enable full debugging for 16all slabs. 17 18Typically one would then use the ``slabinfo`` command to get statistical 19data and perform operation on the slabs. By default ``slabinfo`` only lists 20slabs that have data in them. See "slabinfo -h" for more options when 21running the command. ``slabinfo`` can be compiled with 22:: 23 24 gcc -o slabinfo tools/vm/slabinfo.c 25 26Some of the modes of operation of ``slabinfo`` require that slub debugging 27be enabled on the command line. F.e. no tracking information will be 28available without debugging on and validation can only partially 29be performed if debugging was not switched on. 30 31Some more sophisticated uses of slub_debug: 32------------------------------------------- 33 34Parameters may be given to ``slub_debug``. If none is specified then full 35debugging is enabled. Format: 36 37slub_debug=<Debug-Options> 38 Enable options for all slabs 39slub_debug=<Debug-Options>,<slab name> 40 Enable options only for select slabs 41 42 43Possible debug options are:: 44 45 F Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS 46 Sorry SLAB legacy issues) 47 Z Red zoning 48 P Poisoning (object and padding) 49 U User tracking (free and alloc) 50 T Trace (please only use on single slabs) 51 A Toggle failslab filter mark for the cache 52 O Switch debugging off for caches that would have 53 caused higher minimum slab orders 54 - Switch all debugging off (useful if the kernel is 55 configured with CONFIG_SLUB_DEBUG_ON) 56 57F.e. in order to boot just with sanity checks and red zoning one would specify:: 58 59 slub_debug=FZ 60 61Trying to find an issue in the dentry cache? Try:: 62 63 slub_debug=,dentry 64 65to only enable debugging on the dentry cache. 66 67Red zoning and tracking may realign the slab. We can just apply sanity checks 68to the dentry cache with:: 69 70 slub_debug=F,dentry 71 72Debugging options may require the minimum possible slab order to increase as 73a result of storing the metadata (for example, caches with PAGE_SIZE object 74sizes). This has a higher liklihood of resulting in slab allocation errors 75in low memory situations or if there's high fragmentation of memory. To 76switch off debugging for such caches by default, use:: 77 78 slub_debug=O 79 80In case you forgot to enable debugging on the kernel command line: It is 81possible to enable debugging manually when the kernel is up. Look at the 82contents of:: 83 84 /sys/kernel/slab/<slab name>/ 85 86Look at the writable files. Writing 1 to them will enable the 87corresponding debug option. All options can be set on a slab that does 88not contain objects. If the slab already contains objects then sanity checks 89and tracing may only be enabled. The other options may cause the realignment 90of objects. 91 92Careful with tracing: It may spew out lots of information and never stop if 93used on the wrong slab. 94 95Slab merging 96============ 97 98If no debug options are specified then SLUB may merge similar slabs together 99in order to reduce overhead and increase cache hotness of objects. 100``slabinfo -a`` displays which slabs were merged together. 101 102Slab validation 103=============== 104 105SLUB can validate all object if the kernel was booted with slub_debug. In 106order to do so you must have the ``slabinfo`` tool. Then you can do 107:: 108 109 slabinfo -v 110 111which will test all objects. Output will be generated to the syslog. 112 113This also works in a more limited way if boot was without slab debug. 114In that case ``slabinfo -v`` simply tests all reachable objects. Usually 115these are in the cpu slabs and the partial slabs. Full slabs are not 116tracked by SLUB in a non debug situation. 117 118Getting more performance 119======================== 120 121To some degree SLUB's performance is limited by the need to take the 122list_lock once in a while to deal with partial slabs. That overhead is 123governed by the order of the allocation for each slab. The allocations 124can be influenced by kernel parameters: 125 126.. slub_min_objects=x (default 4) 127.. slub_min_order=x (default 0) 128.. slub_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) 129 130``slub_min_objects`` 131 allows to specify how many objects must at least fit into one 132 slab in order for the allocation order to be acceptable. In 133 general slub will be able to perform this number of 134 allocations on a slab without consulting centralized resources 135 (list_lock) where contention may occur. 136 137``slub_min_order`` 138 specifies a minim order of slabs. A similar effect like 139 ``slub_min_objects``. 140 141``slub_max_order`` 142 specified the order at which ``slub_min_objects`` should no 143 longer be checked. This is useful to avoid SLUB trying to 144 generate super large order pages to fit ``slub_min_objects`` 145 of a slab cache with large object sizes into one high order 146 page. Setting command line parameter 147 ``debug_guardpage_minorder=N`` (N > 0), forces setting 148 ``slub_max_order`` to 0, what cause minimum possible order of 149 slabs allocation. 150 151SLUB Debug output 152================= 153 154Here is a sample of slub debug output:: 155 156 ==================================================================== 157 BUG kmalloc-8: Redzone overwritten 158 -------------------------------------------------------------------- 159 160 INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc 161 INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58 162 INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58 163 INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554 164 165 Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ 166 Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005 167 Redzone 0xc90f6d28: 00 cc cc cc . 168 Padding 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ 169 170 [<c010523d>] dump_trace+0x63/0x1eb 171 [<c01053df>] show_trace_log_lvl+0x1a/0x2f 172 [<c010601d>] show_trace+0x12/0x14 173 [<c0106035>] dump_stack+0x16/0x18 174 [<c017e0fa>] object_err+0x143/0x14b 175 [<c017e2cc>] check_object+0x66/0x234 176 [<c017eb43>] __slab_free+0x239/0x384 177 [<c017f446>] kfree+0xa6/0xc6 178 [<c02e2335>] get_modalias+0xb9/0xf5 179 [<c02e23b7>] dmi_dev_uevent+0x27/0x3c 180 [<c027866a>] dev_uevent+0x1ad/0x1da 181 [<c0205024>] kobject_uevent_env+0x20a/0x45b 182 [<c020527f>] kobject_uevent+0xa/0xf 183 [<c02779f1>] store_uevent+0x4f/0x58 184 [<c027758e>] dev_attr_store+0x29/0x2f 185 [<c01bec4f>] sysfs_write_file+0x16e/0x19c 186 [<c0183ba7>] vfs_write+0xd1/0x15a 187 [<c01841d7>] sys_write+0x3d/0x72 188 [<c0104112>] sysenter_past_esp+0x5f/0x99 189 [<b7f7b410>] 0xb7f7b410 190 ======================= 191 192 FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc 193 194If SLUB encounters a corrupted object (full detection requires the kernel 195to be booted with slub_debug) then the following output will be dumped 196into the syslog: 197 1981. Description of the problem encountered 199 200 This will be a message in the system log starting with:: 201 202 =============================================== 203 BUG <slab cache affected>: <What went wrong> 204 ----------------------------------------------- 205 206 INFO: <corruption start>-<corruption_end> <more info> 207 INFO: Slab <address> <slab information> 208 INFO: Object <address> <object information> 209 INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by 210 cpu> pid=<pid of the process> 211 INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu> 212 pid=<pid of the process> 213 214 (Object allocation / free information is only available if SLAB_STORE_USER is 215 set for the slab. slub_debug sets that option) 216 2172. The object contents if an object was involved. 218 219 Various types of lines can follow the BUG SLUB line: 220 221 Bytes b4 <address> : <bytes> 222 Shows a few bytes before the object where the problem was detected. 223 Can be useful if the corruption does not stop with the start of the 224 object. 225 226 Object <address> : <bytes> 227 The bytes of the object. If the object is inactive then the bytes 228 typically contain poison values. Any non-poison value shows a 229 corruption by a write after free. 230 231 Redzone <address> : <bytes> 232 The Redzone following the object. The Redzone is used to detect 233 writes after the object. All bytes should always have the same 234 value. If there is any deviation then it is due to a write after 235 the object boundary. 236 237 (Redzone information is only available if SLAB_RED_ZONE is set. 238 slub_debug sets that option) 239 240 Padding <address> : <bytes> 241 Unused data to fill up the space in order to get the next object 242 properly aligned. In the debug case we make sure that there are 243 at least 4 bytes of padding. This allows the detection of writes 244 before the object. 245 2463. A stackdump 247 248 The stackdump describes the location where the error was detected. The cause 249 of the corruption is may be more likely found by looking at the function that 250 allocated or freed the object. 251 2524. Report on how the problem was dealt with in order to ensure the continued 253 operation of the system. 254 255 These are messages in the system log beginning with:: 256 257 FIX <slab cache affected>: <corrective action taken> 258 259 In the above sample SLUB found that the Redzone of an active object has 260 been overwritten. Here a string of 8 characters was written into a slab that 261 has the length of 8 characters. However, a 8 character string needs a 262 terminating 0. That zero has overwritten the first byte of the Redzone field. 263 After reporting the details of the issue encountered the FIX SLUB message 264 tells us that SLUB has restored the Redzone to its proper value and then 265 system operations continue. 266 267Emergency operations 268==================== 269 270Minimal debugging (sanity checks alone) can be enabled by booting with:: 271 272 slub_debug=F 273 274This will be generally be enough to enable the resiliency features of slub 275which will keep the system running even if a bad kernel component will 276keep corrupting objects. This may be important for production systems. 277Performance will be impacted by the sanity checks and there will be a 278continual stream of error messages to the syslog but no additional memory 279will be used (unlike full debugging). 280 281No guarantees. The kernel component still needs to be fixed. Performance 282may be optimized further by locating the slab that experiences corruption 283and enabling debugging only for that cache 284 285I.e.:: 286 287 slub_debug=F,dentry 288 289If the corruption occurs by writing after the end of the object then it 290may be advisable to enable a Redzone to avoid corrupting the beginning 291of other objects:: 292 293 slub_debug=FZ,dentry 294 295Extended slabinfo mode and plotting 296=================================== 297 298The ``slabinfo`` tool has a special 'extended' ('-X') mode that includes: 299 - Slabcache Totals 300 - Slabs sorted by size (up to -N <num> slabs, default 1) 301 - Slabs sorted by loss (up to -N <num> slabs, default 1) 302 303Additionally, in this mode ``slabinfo`` does not dynamically scale 304sizes (G/M/K) and reports everything in bytes (this functionality is 305also available to other slabinfo modes via '-B' option) which makes 306reporting more precise and accurate. Moreover, in some sense the `-X' 307mode also simplifies the analysis of slabs' behaviour, because its 308output can be plotted using the ``slabinfo-gnuplot.sh`` script. So it 309pushes the analysis from looking through the numbers (tons of numbers) 310to something easier -- visual analysis. 311 312To generate plots: 313 314a) collect slabinfo extended records, for example:: 315 316 while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done 317 318b) pass stats file(-s) to ``slabinfo-gnuplot.sh`` script:: 319 320 slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN] 321 322 The ``slabinfo-gnuplot.sh`` script will pre-processes the collected records 323 and generates 3 png files (and 3 pre-processing cache files) per STATS 324 file: 325 - Slabcache Totals: FOO_STATS-totals.png 326 - Slabs sorted by size: FOO_STATS-slabs-by-size.png 327 - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png 328 329Another use case, when ``slabinfo-gnuplot.sh`` can be useful, is when you 330need to compare slabs' behaviour "prior to" and "after" some code 331modification. To help you out there, ``slabinfo-gnuplot.sh`` script 332can 'merge' the `Slabcache Totals` sections from different 333measurements. To visually compare N plots: 334 335a) Collect as many STATS1, STATS2, .. STATSN files as you need:: 336 337 while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done 338 339b) Pre-process those STATS files:: 340 341 slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN 342 343c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the 344 generated pre-processed \*-totals:: 345 346 slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals 347 348 This will produce a single plot (png file). 349 350 Plots, expectedly, can be large so some fluctuations or small spikes 351 can go unnoticed. To deal with that, ``slabinfo-gnuplot.sh`` has two 352 options to 'zoom-in'/'zoom-out': 353 354 a) ``-s %d,%d`` -- overwrites the default image width and heigh 355 b) ``-r %d,%d`` -- specifies a range of samples to use (for example, 356 in ``slabinfo -X >> FOO_STATS; sleep 1;`` case, using a ``-r 357 40,60`` range will plot only samples collected between 40th and 358 60th seconds). 359 360Christoph Lameter, May 30, 2007 361Sergey Senozhatsky, October 23, 2015 362