1.. _slub:
2
3==========================
4Short users guide for SLUB
5==========================
6
7The basic philosophy of SLUB is very different from SLAB. SLAB
8requires rebuilding the kernel to activate debug options for all
9slab caches. SLUB always includes full debugging but it is off by default.
10SLUB can enable debugging only for selected slabs in order to avoid
11an impact on overall system performance which may make a bug more
12difficult to find.
13
14In order to switch debugging on one can add an option ``slub_debug``
15to the kernel command line. That will enable full debugging for
16all slabs.
17
18Typically one would then use the ``slabinfo`` command to get statistical
19data and perform operation on the slabs. By default ``slabinfo`` only lists
20slabs that have data in them. See "slabinfo -h" for more options when
21running the command. ``slabinfo`` can be compiled with
22::
23
24	gcc -o slabinfo tools/vm/slabinfo.c
25
26Some of the modes of operation of ``slabinfo`` require that slub debugging
27be enabled on the command line. F.e. no tracking information will be
28available without debugging on and validation can only partially
29be performed if debugging was not switched on.
30
31Some more sophisticated uses of slub_debug:
32-------------------------------------------
33
34Parameters may be given to ``slub_debug``. If none is specified then full
35debugging is enabled. Format:
36
37slub_debug=<Debug-Options>
38	Enable options for all slabs
39slub_debug=<Debug-Options>,<slab name>
40	Enable options only for select slabs
41
42
43Possible debug options are::
44
45	F		Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS
46			Sorry SLAB legacy issues)
47	Z		Red zoning
48	P		Poisoning (object and padding)
49	U		User tracking (free and alloc)
50	T		Trace (please only use on single slabs)
51	A		Toggle failslab filter mark for the cache
52	O		Switch debugging off for caches that would have
53			caused higher minimum slab orders
54	-		Switch all debugging off (useful if the kernel is
55			configured with CONFIG_SLUB_DEBUG_ON)
56
57F.e. in order to boot just with sanity checks and red zoning one would specify::
58
59	slub_debug=FZ
60
61Trying to find an issue in the dentry cache? Try::
62
63	slub_debug=,dentry
64
65to only enable debugging on the dentry cache.
66
67Red zoning and tracking may realign the slab.  We can just apply sanity checks
68to the dentry cache with::
69
70	slub_debug=F,dentry
71
72Debugging options may require the minimum possible slab order to increase as
73a result of storing the metadata (for example, caches with PAGE_SIZE object
74sizes).  This has a higher liklihood of resulting in slab allocation errors
75in low memory situations or if there's high fragmentation of memory.  To
76switch off debugging for such caches by default, use::
77
78	slub_debug=O
79
80In case you forgot to enable debugging on the kernel command line: It is
81possible to enable debugging manually when the kernel is up. Look at the
82contents of::
83
84	/sys/kernel/slab/<slab name>/
85
86Look at the writable files. Writing 1 to them will enable the
87corresponding debug option. All options can be set on a slab that does
88not contain objects. If the slab already contains objects then sanity checks
89and tracing may only be enabled. The other options may cause the realignment
90of objects.
91
92Careful with tracing: It may spew out lots of information and never stop if
93used on the wrong slab.
94
95Slab merging
96============
97
98If no debug options are specified then SLUB may merge similar slabs together
99in order to reduce overhead and increase cache hotness of objects.
100``slabinfo -a`` displays which slabs were merged together.
101
102Slab validation
103===============
104
105SLUB can validate all object if the kernel was booted with slub_debug. In
106order to do so you must have the ``slabinfo`` tool. Then you can do
107::
108
109	slabinfo -v
110
111which will test all objects. Output will be generated to the syslog.
112
113This also works in a more limited way if boot was without slab debug.
114In that case ``slabinfo -v`` simply tests all reachable objects. Usually
115these are in the cpu slabs and the partial slabs. Full slabs are not
116tracked by SLUB in a non debug situation.
117
118Getting more performance
119========================
120
121To some degree SLUB's performance is limited by the need to take the
122list_lock once in a while to deal with partial slabs. That overhead is
123governed by the order of the allocation for each slab. The allocations
124can be influenced by kernel parameters:
125
126.. slub_min_objects=x		(default 4)
127.. slub_min_order=x		(default 0)
128.. slub_max_order=x		(default 3 (PAGE_ALLOC_COSTLY_ORDER))
129
130``slub_min_objects``
131	allows to specify how many objects must at least fit into one
132	slab in order for the allocation order to be acceptable.  In
133	general slub will be able to perform this number of
134	allocations on a slab without consulting centralized resources
135	(list_lock) where contention may occur.
136
137``slub_min_order``
138	specifies a minim order of slabs. A similar effect like
139	``slub_min_objects``.
140
141``slub_max_order``
142	specified the order at which ``slub_min_objects`` should no
143	longer be checked. This is useful to avoid SLUB trying to
144	generate super large order pages to fit ``slub_min_objects``
145	of a slab cache with large object sizes into one high order
146	page. Setting command line parameter
147	``debug_guardpage_minorder=N`` (N > 0), forces setting
148	``slub_max_order`` to 0, what cause minimum possible order of
149	slabs allocation.
150
151SLUB Debug output
152=================
153
154Here is a sample of slub debug output::
155
156 ====================================================================
157 BUG kmalloc-8: Redzone overwritten
158 --------------------------------------------------------------------
159
160 INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc
161 INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58
162 INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58
163 INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554
164
165 Bytes b4 0xc90f6d10:  00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
166   Object 0xc90f6d20:  31 30 31 39 2e 30 30 35                         1019.005
167  Redzone 0xc90f6d28:  00 cc cc cc                                     .
168  Padding 0xc90f6d50:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
169
170   [<c010523d>] dump_trace+0x63/0x1eb
171   [<c01053df>] show_trace_log_lvl+0x1a/0x2f
172   [<c010601d>] show_trace+0x12/0x14
173   [<c0106035>] dump_stack+0x16/0x18
174   [<c017e0fa>] object_err+0x143/0x14b
175   [<c017e2cc>] check_object+0x66/0x234
176   [<c017eb43>] __slab_free+0x239/0x384
177   [<c017f446>] kfree+0xa6/0xc6
178   [<c02e2335>] get_modalias+0xb9/0xf5
179   [<c02e23b7>] dmi_dev_uevent+0x27/0x3c
180   [<c027866a>] dev_uevent+0x1ad/0x1da
181   [<c0205024>] kobject_uevent_env+0x20a/0x45b
182   [<c020527f>] kobject_uevent+0xa/0xf
183   [<c02779f1>] store_uevent+0x4f/0x58
184   [<c027758e>] dev_attr_store+0x29/0x2f
185   [<c01bec4f>] sysfs_write_file+0x16e/0x19c
186   [<c0183ba7>] vfs_write+0xd1/0x15a
187   [<c01841d7>] sys_write+0x3d/0x72
188   [<c0104112>] sysenter_past_esp+0x5f/0x99
189   [<b7f7b410>] 0xb7f7b410
190   =======================
191
192 FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc
193
194If SLUB encounters a corrupted object (full detection requires the kernel
195to be booted with slub_debug) then the following output will be dumped
196into the syslog:
197
1981. Description of the problem encountered
199
200   This will be a message in the system log starting with::
201
202     ===============================================
203     BUG <slab cache affected>: <What went wrong>
204     -----------------------------------------------
205
206     INFO: <corruption start>-<corruption_end> <more info>
207     INFO: Slab <address> <slab information>
208     INFO: Object <address> <object information>
209     INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by
210	cpu> pid=<pid of the process>
211     INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu>
212	pid=<pid of the process>
213
214   (Object allocation / free information is only available if SLAB_STORE_USER is
215   set for the slab. slub_debug sets that option)
216
2172. The object contents if an object was involved.
218
219   Various types of lines can follow the BUG SLUB line:
220
221   Bytes b4 <address> : <bytes>
222	Shows a few bytes before the object where the problem was detected.
223	Can be useful if the corruption does not stop with the start of the
224	object.
225
226   Object <address> : <bytes>
227	The bytes of the object. If the object is inactive then the bytes
228	typically contain poison values. Any non-poison value shows a
229	corruption by a write after free.
230
231   Redzone <address> : <bytes>
232	The Redzone following the object. The Redzone is used to detect
233	writes after the object. All bytes should always have the same
234	value. If there is any deviation then it is due to a write after
235	the object boundary.
236
237	(Redzone information is only available if SLAB_RED_ZONE is set.
238	slub_debug sets that option)
239
240   Padding <address> : <bytes>
241	Unused data to fill up the space in order to get the next object
242	properly aligned. In the debug case we make sure that there are
243	at least 4 bytes of padding. This allows the detection of writes
244	before the object.
245
2463. A stackdump
247
248   The stackdump describes the location where the error was detected. The cause
249   of the corruption is may be more likely found by looking at the function that
250   allocated or freed the object.
251
2524. Report on how the problem was dealt with in order to ensure the continued
253   operation of the system.
254
255   These are messages in the system log beginning with::
256
257	FIX <slab cache affected>: <corrective action taken>
258
259   In the above sample SLUB found that the Redzone of an active object has
260   been overwritten. Here a string of 8 characters was written into a slab that
261   has the length of 8 characters. However, a 8 character string needs a
262   terminating 0. That zero has overwritten the first byte of the Redzone field.
263   After reporting the details of the issue encountered the FIX SLUB message
264   tells us that SLUB has restored the Redzone to its proper value and then
265   system operations continue.
266
267Emergency operations
268====================
269
270Minimal debugging (sanity checks alone) can be enabled by booting with::
271
272	slub_debug=F
273
274This will be generally be enough to enable the resiliency features of slub
275which will keep the system running even if a bad kernel component will
276keep corrupting objects. This may be important for production systems.
277Performance will be impacted by the sanity checks and there will be a
278continual stream of error messages to the syslog but no additional memory
279will be used (unlike full debugging).
280
281No guarantees. The kernel component still needs to be fixed. Performance
282may be optimized further by locating the slab that experiences corruption
283and enabling debugging only for that cache
284
285I.e.::
286
287	slub_debug=F,dentry
288
289If the corruption occurs by writing after the end of the object then it
290may be advisable to enable a Redzone to avoid corrupting the beginning
291of other objects::
292
293	slub_debug=FZ,dentry
294
295Extended slabinfo mode and plotting
296===================================
297
298The ``slabinfo`` tool has a special 'extended' ('-X') mode that includes:
299 - Slabcache Totals
300 - Slabs sorted by size (up to -N <num> slabs, default 1)
301 - Slabs sorted by loss (up to -N <num> slabs, default 1)
302
303Additionally, in this mode ``slabinfo`` does not dynamically scale
304sizes (G/M/K) and reports everything in bytes (this functionality is
305also available to other slabinfo modes via '-B' option) which makes
306reporting more precise and accurate. Moreover, in some sense the `-X'
307mode also simplifies the analysis of slabs' behaviour, because its
308output can be plotted using the ``slabinfo-gnuplot.sh`` script. So it
309pushes the analysis from looking through the numbers (tons of numbers)
310to something easier -- visual analysis.
311
312To generate plots:
313
314a) collect slabinfo extended records, for example::
315
316	while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done
317
318b) pass stats file(-s) to ``slabinfo-gnuplot.sh`` script::
319
320	slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN]
321
322   The ``slabinfo-gnuplot.sh`` script will pre-processes the collected records
323   and generates 3 png files (and 3 pre-processing cache files) per STATS
324   file:
325   - Slabcache Totals: FOO_STATS-totals.png
326   - Slabs sorted by size: FOO_STATS-slabs-by-size.png
327   - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png
328
329Another use case, when ``slabinfo-gnuplot.sh`` can be useful, is when you
330need to compare slabs' behaviour "prior to" and "after" some code
331modification.  To help you out there, ``slabinfo-gnuplot.sh`` script
332can 'merge' the `Slabcache Totals` sections from different
333measurements. To visually compare N plots:
334
335a) Collect as many STATS1, STATS2, .. STATSN files as you need::
336
337	while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done
338
339b) Pre-process those STATS files::
340
341	slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN
342
343c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
344   generated pre-processed \*-totals::
345
346	slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals
347
348   This will produce a single plot (png file).
349
350   Plots, expectedly, can be large so some fluctuations or small spikes
351   can go unnoticed. To deal with that, ``slabinfo-gnuplot.sh`` has two
352   options to 'zoom-in'/'zoom-out':
353
354   a) ``-s %d,%d`` -- overwrites the default image width and heigh
355   b) ``-r %d,%d`` -- specifies a range of samples to use (for example,
356      in ``slabinfo -X >> FOO_STATS; sleep 1;`` case, using a ``-r
357      40,60`` range will plot only samples collected between 40th and
358      60th seconds).
359
360Christoph Lameter, May 30, 2007
361Sergey Senozhatsky, October 23, 2015
362