1.. SPDX-License-Identifier: GPL-2.0
2.. Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
3.. Copyright © 2019-2020 ANSSI
4.. Copyright © 2021-2022 Microsoft Corporation
5
6=====================================
7Landlock: unprivileged access control
8=====================================
9
10:Author: Mickaël Salaün
11:Date: October 2022
12
13The goal of Landlock is to enable to restrict ambient rights (e.g. global
14filesystem access) for a set of processes.  Because Landlock is a stackable
15LSM, it makes possible to create safe security sandboxes as new security layers
16in addition to the existing system-wide access-controls. This kind of sandbox
17is expected to help mitigate the security impact of bugs or
18unexpected/malicious behaviors in user space applications.  Landlock empowers
19any process, including unprivileged ones, to securely restrict themselves.
20
21We can quickly make sure that Landlock is enabled in the running system by
22looking for "landlock: Up and running" in kernel logs (as root): ``dmesg | grep
23landlock || journalctl -kg landlock`` .  Developers can also easily check for
24Landlock support with a :ref:`related system call <landlock_abi_versions>`.  If
25Landlock is not currently supported, we need to :ref:`configure the kernel
26appropriately <kernel_support>`.
27
28Landlock rules
29==============
30
31A Landlock rule describes an action on an object.  An object is currently a
32file hierarchy, and the related filesystem actions are defined with `access
33rights`_.  A set of rules is aggregated in a ruleset, which can then restrict
34the thread enforcing it, and its future children.
35
36Defining and enforcing a security policy
37----------------------------------------
38
39We first need to define the ruleset that will contain our rules.  For this
40example, the ruleset will contain rules that only allow read actions, but write
41actions will be denied.  The ruleset then needs to handle both of these kind of
42actions.  This is required for backward and forward compatibility (i.e. the
43kernel and user space may not know each other's supported restrictions), hence
44the need to be explicit about the denied-by-default access rights.
45
46.. code-block:: c
47
48    struct landlock_ruleset_attr ruleset_attr = {
49        .handled_access_fs =
50            LANDLOCK_ACCESS_FS_EXECUTE |
51            LANDLOCK_ACCESS_FS_WRITE_FILE |
52            LANDLOCK_ACCESS_FS_READ_FILE |
53            LANDLOCK_ACCESS_FS_READ_DIR |
54            LANDLOCK_ACCESS_FS_REMOVE_DIR |
55            LANDLOCK_ACCESS_FS_REMOVE_FILE |
56            LANDLOCK_ACCESS_FS_MAKE_CHAR |
57            LANDLOCK_ACCESS_FS_MAKE_DIR |
58            LANDLOCK_ACCESS_FS_MAKE_REG |
59            LANDLOCK_ACCESS_FS_MAKE_SOCK |
60            LANDLOCK_ACCESS_FS_MAKE_FIFO |
61            LANDLOCK_ACCESS_FS_MAKE_BLOCK |
62            LANDLOCK_ACCESS_FS_MAKE_SYM |
63            LANDLOCK_ACCESS_FS_REFER |
64            LANDLOCK_ACCESS_FS_TRUNCATE,
65    };
66
67Because we may not know on which kernel version an application will be
68executed, it is safer to follow a best-effort security approach.  Indeed, we
69should try to protect users as much as possible whatever the kernel they are
70using.  To avoid binary enforcement (i.e. either all security features or
71none), we can leverage a dedicated Landlock command to get the current version
72of the Landlock ABI and adapt the handled accesses.  Let's check if we should
73remove the ``LANDLOCK_ACCESS_FS_REFER`` or ``LANDLOCK_ACCESS_FS_TRUNCATE``
74access rights, which are only supported starting with the second and third
75version of the ABI.
76
77.. code-block:: c
78
79    int abi;
80
81    abi = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION);
82    if (abi < 0) {
83        /* Degrades gracefully if Landlock is not handled. */
84        perror("The running kernel does not enable to use Landlock");
85        return 0;
86    }
87    switch (abi) {
88    case 1:
89        /* Removes LANDLOCK_ACCESS_FS_REFER for ABI < 2 */
90        ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_REFER;
91        __attribute__((fallthrough));
92    case 2:
93        /* Removes LANDLOCK_ACCESS_FS_TRUNCATE for ABI < 3 */
94        ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_TRUNCATE;
95    }
96
97This enables to create an inclusive ruleset that will contain our rules.
98
99.. code-block:: c
100
101    int ruleset_fd;
102
103    ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
104    if (ruleset_fd < 0) {
105        perror("Failed to create a ruleset");
106        return 1;
107    }
108
109We can now add a new rule to this ruleset thanks to the returned file
110descriptor referring to this ruleset.  The rule will only allow reading the
111file hierarchy ``/usr``.  Without another rule, write actions would then be
112denied by the ruleset.  To add ``/usr`` to the ruleset, we open it with the
113``O_PATH`` flag and fill the &struct landlock_path_beneath_attr with this file
114descriptor.
115
116.. code-block:: c
117
118    int err;
119    struct landlock_path_beneath_attr path_beneath = {
120        .allowed_access =
121            LANDLOCK_ACCESS_FS_EXECUTE |
122            LANDLOCK_ACCESS_FS_READ_FILE |
123            LANDLOCK_ACCESS_FS_READ_DIR,
124    };
125
126    path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
127    if (path_beneath.parent_fd < 0) {
128        perror("Failed to open file");
129        close(ruleset_fd);
130        return 1;
131    }
132    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
133                            &path_beneath, 0);
134    close(path_beneath.parent_fd);
135    if (err) {
136        perror("Failed to update ruleset");
137        close(ruleset_fd);
138        return 1;
139    }
140
141It may also be required to create rules following the same logic as explained
142for the ruleset creation, by filtering access rights according to the Landlock
143ABI version.  In this example, this is not required because all of the requested
144``allowed_access`` rights are already available in ABI 1.
145
146We now have a ruleset with one rule allowing read access to ``/usr`` while
147denying all other handled accesses for the filesystem.  The next step is to
148restrict the current thread from gaining more privileges (e.g. thanks to a SUID
149binary).
150
151.. code-block:: c
152
153    if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
154        perror("Failed to restrict privileges");
155        close(ruleset_fd);
156        return 1;
157    }
158
159The current thread is now ready to sandbox itself with the ruleset.
160
161.. code-block:: c
162
163    if (landlock_restrict_self(ruleset_fd, 0)) {
164        perror("Failed to enforce ruleset");
165        close(ruleset_fd);
166        return 1;
167    }
168    close(ruleset_fd);
169
170If the ``landlock_restrict_self`` system call succeeds, the current thread is
171now restricted and this policy will be enforced on all its subsequently created
172children as well.  Once a thread is landlocked, there is no way to remove its
173security policy; only adding more restrictions is allowed.  These threads are
174now in a new Landlock domain, merge of their parent one (if any) with the new
175ruleset.
176
177Full working code can be found in `samples/landlock/sandboxer.c`_.
178
179Good practices
180--------------
181
182It is recommended setting access rights to file hierarchy leaves as much as
183possible.  For instance, it is better to be able to have ``~/doc/`` as a
184read-only hierarchy and ``~/tmp/`` as a read-write hierarchy, compared to
185``~/`` as a read-only hierarchy and ``~/tmp/`` as a read-write hierarchy.
186Following this good practice leads to self-sufficient hierarchies that do not
187depend on their location (i.e. parent directories).  This is particularly
188relevant when we want to allow linking or renaming.  Indeed, having consistent
189access rights per directory enables to change the location of such directory
190without relying on the destination directory access rights (except those that
191are required for this operation, see ``LANDLOCK_ACCESS_FS_REFER``
192documentation).
193Having self-sufficient hierarchies also helps to tighten the required access
194rights to the minimal set of data.  This also helps avoid sinkhole directories,
195i.e.  directories where data can be linked to but not linked from.  However,
196this depends on data organization, which might not be controlled by developers.
197In this case, granting read-write access to ``~/tmp/``, instead of write-only
198access, would potentially allow to move ``~/tmp/`` to a non-readable directory
199and still keep the ability to list the content of ``~/tmp/``.
200
201Layers of file path access rights
202---------------------------------
203
204Each time a thread enforces a ruleset on itself, it updates its Landlock domain
205with a new layer of policy.  Indeed, this complementary policy is stacked with
206the potentially other rulesets already restricting this thread.  A sandboxed
207thread can then safely add more constraints to itself with a new enforced
208ruleset.
209
210One policy layer grants access to a file path if at least one of its rules
211encountered on the path grants the access.  A sandboxed thread can only access
212a file path if all its enforced policy layers grant the access as well as all
213the other system access controls (e.g. filesystem DAC, other LSM policies,
214etc.).
215
216Bind mounts and OverlayFS
217-------------------------
218
219Landlock enables to restrict access to file hierarchies, which means that these
220access rights can be propagated with bind mounts (cf.
221Documentation/filesystems/sharedsubtree.rst) but not with
222Documentation/filesystems/overlayfs.rst.
223
224A bind mount mirrors a source file hierarchy to a destination.  The destination
225hierarchy is then composed of the exact same files, on which Landlock rules can
226be tied, either via the source or the destination path.  These rules restrict
227access when they are encountered on a path, which means that they can restrict
228access to multiple file hierarchies at the same time, whether these hierarchies
229are the result of bind mounts or not.
230
231An OverlayFS mount point consists of upper and lower layers.  These layers are
232combined in a merge directory, result of the mount point.  This merge hierarchy
233may include files from the upper and lower layers, but modifications performed
234on the merge hierarchy only reflects on the upper layer.  From a Landlock
235policy point of view, each OverlayFS layers and merge hierarchies are
236standalone and contains their own set of files and directories, which is
237different from bind mounts.  A policy restricting an OverlayFS layer will not
238restrict the resulted merged hierarchy, and vice versa.  Landlock users should
239then only think about file hierarchies they want to allow access to, regardless
240of the underlying filesystem.
241
242Inheritance
243-----------
244
245Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain
246restrictions from its parent.  This is similar to the seccomp inheritance (cf.
247Documentation/userspace-api/seccomp_filter.rst) or any other LSM dealing with
248task's :manpage:`credentials(7)`.  For instance, one process's thread may apply
249Landlock rules to itself, but they will not be automatically applied to other
250sibling threads (unlike POSIX thread credential changes, cf.
251:manpage:`nptl(7)`).
252
253When a thread sandboxes itself, we have the guarantee that the related security
254policy will stay enforced on all this thread's descendants.  This allows
255creating standalone and modular security policies per application, which will
256automatically be composed between themselves according to their runtime parent
257policies.
258
259Ptrace restrictions
260-------------------
261
262A sandboxed process has less privileges than a non-sandboxed process and must
263then be subject to additional restrictions when manipulating another process.
264To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target
265process, a sandboxed process should have a subset of the target process rules,
266which means the tracee must be in a sub-domain of the tracer.
267
268Truncating files
269----------------
270
271The operations covered by ``LANDLOCK_ACCESS_FS_WRITE_FILE`` and
272``LANDLOCK_ACCESS_FS_TRUNCATE`` both change the contents of a file and sometimes
273overlap in non-intuitive ways.  It is recommended to always specify both of
274these together.
275
276A particularly surprising example is :manpage:`creat(2)`.  The name suggests
277that this system call requires the rights to create and write files.  However,
278it also requires the truncate right if an existing file under the same name is
279already present.
280
281It should also be noted that truncating files does not require the
282``LANDLOCK_ACCESS_FS_WRITE_FILE`` right.  Apart from the :manpage:`truncate(2)`
283system call, this can also be done through :manpage:`open(2)` with the flags
284``O_RDONLY | O_TRUNC``.
285
286When opening a file, the availability of the ``LANDLOCK_ACCESS_FS_TRUNCATE``
287right is associated with the newly created file descriptor and will be used for
288subsequent truncation attempts using :manpage:`ftruncate(2)`.  The behavior is
289similar to opening a file for reading or writing, where permissions are checked
290during :manpage:`open(2)`, but not during the subsequent :manpage:`read(2)` and
291:manpage:`write(2)` calls.
292
293As a consequence, it is possible to have multiple open file descriptors for the
294same file, where one grants the right to truncate the file and the other does
295not.  It is also possible to pass such file descriptors between processes,
296keeping their Landlock properties, even when these processes do not have an
297enforced Landlock ruleset.
298
299Compatibility
300=============
301
302Backward and forward compatibility
303----------------------------------
304
305Landlock is designed to be compatible with past and future versions of the
306kernel.  This is achieved thanks to the system call attributes and the
307associated bitflags, particularly the ruleset's ``handled_access_fs``.  Making
308handled access right explicit enables the kernel and user space to have a clear
309contract with each other.  This is required to make sure sandboxing will not
310get stricter with a system update, which could break applications.
311
312Developers can subscribe to the `Landlock mailing list
313<https://subspace.kernel.org/lists.linux.dev.html>`_ to knowingly update and
314test their applications with the latest available features.  In the interest of
315users, and because they may use different kernel versions, it is strongly
316encouraged to follow a best-effort security approach by checking the Landlock
317ABI version at runtime and only enforcing the supported features.
318
319.. _landlock_abi_versions:
320
321Landlock ABI versions
322---------------------
323
324The Landlock ABI version can be read with the sys_landlock_create_ruleset()
325system call:
326
327.. code-block:: c
328
329    int abi;
330
331    abi = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION);
332    if (abi < 0) {
333        switch (errno) {
334        case ENOSYS:
335            printf("Landlock is not supported by the current kernel.\n");
336            break;
337        case EOPNOTSUPP:
338            printf("Landlock is currently disabled.\n");
339            break;
340        }
341        return 0;
342    }
343    if (abi >= 2) {
344        printf("Landlock supports LANDLOCK_ACCESS_FS_REFER.\n");
345    }
346
347The following kernel interfaces are implicitly supported by the first ABI
348version.  Features only supported from a specific version are explicitly marked
349as such.
350
351Kernel interface
352================
353
354Access rights
355-------------
356
357.. kernel-doc:: include/uapi/linux/landlock.h
358    :identifiers: fs_access
359
360Creating a new ruleset
361----------------------
362
363.. kernel-doc:: security/landlock/syscalls.c
364    :identifiers: sys_landlock_create_ruleset
365
366.. kernel-doc:: include/uapi/linux/landlock.h
367    :identifiers: landlock_ruleset_attr
368
369Extending a ruleset
370-------------------
371
372.. kernel-doc:: security/landlock/syscalls.c
373    :identifiers: sys_landlock_add_rule
374
375.. kernel-doc:: include/uapi/linux/landlock.h
376    :identifiers: landlock_rule_type landlock_path_beneath_attr
377
378Enforcing a ruleset
379-------------------
380
381.. kernel-doc:: security/landlock/syscalls.c
382    :identifiers: sys_landlock_restrict_self
383
384Current limitations
385===================
386
387Filesystem topology modification
388--------------------------------
389
390As for file renaming and linking, a sandboxed thread cannot modify its
391filesystem topology, whether via :manpage:`mount(2)` or
392:manpage:`pivot_root(2)`.  However, :manpage:`chroot(2)` calls are not denied.
393
394Special filesystems
395-------------------
396
397Access to regular files and directories can be restricted by Landlock,
398according to the handled accesses of a ruleset.  However, files that do not
399come from a user-visible filesystem (e.g. pipe, socket), but can still be
400accessed through ``/proc/<pid>/fd/*``, cannot currently be explicitly
401restricted.  Likewise, some special kernel filesystems such as nsfs, which can
402be accessed through ``/proc/<pid>/ns/*``, cannot currently be explicitly
403restricted.  However, thanks to the `ptrace restrictions`_, access to such
404sensitive ``/proc`` files are automatically restricted according to domain
405hierarchies.  Future Landlock evolutions could still enable to explicitly
406restrict such paths with dedicated ruleset flags.
407
408Ruleset layers
409--------------
410
411There is a limit of 16 layers of stacked rulesets.  This can be an issue for a
412task willing to enforce a new ruleset in complement to its 16 inherited
413rulesets.  Once this limit is reached, sys_landlock_restrict_self() returns
414E2BIG.  It is then strongly suggested to carefully build rulesets once in the
415life of a thread, especially for applications able to launch other applications
416that may also want to sandbox themselves (e.g. shells, container managers,
417etc.).
418
419Memory usage
420------------
421
422Kernel memory allocated to create rulesets is accounted and can be restricted
423by the Documentation/admin-guide/cgroup-v1/memory.rst.
424
425Previous limitations
426====================
427
428File renaming and linking (ABI < 2)
429-----------------------------------
430
431Because Landlock targets unprivileged access controls, it needs to properly
432handle composition of rules.  Such property also implies rules nesting.
433Properly handling multiple layers of rulesets, each one of them able to
434restrict access to files, also implies inheritance of the ruleset restrictions
435from a parent to its hierarchy.  Because files are identified and restricted by
436their hierarchy, moving or linking a file from one directory to another implies
437propagation of the hierarchy constraints, or restriction of these actions
438according to the potentially lost constraints.  To protect against privilege
439escalations through renaming or linking, and for the sake of simplicity,
440Landlock previously limited linking and renaming to the same directory.
441Starting with the Landlock ABI version 2, it is now possible to securely
442control renaming and linking thanks to the new ``LANDLOCK_ACCESS_FS_REFER``
443access right.
444
445File truncation (ABI < 3)
446-------------------------
447
448File truncation could not be denied before the third Landlock ABI, so it is
449always allowed when using a kernel that only supports the first or second ABI.
450
451Starting with the Landlock ABI version 3, it is now possible to securely control
452truncation thanks to the new ``LANDLOCK_ACCESS_FS_TRUNCATE`` access right.
453
454.. _kernel_support:
455
456Kernel support
457==============
458
459Landlock was first introduced in Linux 5.13 but it must be configured at build
460time with ``CONFIG_SECURITY_LANDLOCK=y``.  Landlock must also be enabled at boot
461time as the other security modules.  The list of security modules enabled by
462default is set with ``CONFIG_LSM``.  The kernel configuration should then
463contains ``CONFIG_LSM=landlock,[...]`` with ``[...]``  as the list of other
464potentially useful security modules for the running system (see the
465``CONFIG_LSM`` help).
466
467If the running kernel does not have ``landlock`` in ``CONFIG_LSM``, then we can
468still enable it by adding ``lsm=landlock,[...]`` to
469Documentation/admin-guide/kernel-parameters.rst thanks to the bootloader
470configuration.
471
472Questions and answers
473=====================
474
475What about user space sandbox managers?
476---------------------------------------
477
478Using user space process to enforce restrictions on kernel resources can lead
479to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of
480the OS code and state
481<https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_).
482
483What about namespaces and containers?
484-------------------------------------
485
486Namespaces can help create sandboxes but they are not designed for
487access-control and then miss useful features for such use case (e.g. no
488fine-grained restrictions).  Moreover, their complexity can lead to security
489issues, especially when untrusted processes can manipulate them (cf.
490`Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_).
491
492Additional documentation
493========================
494
495* Documentation/security/landlock.rst
496* https://landlock.io
497
498.. Links
499.. _samples/landlock/sandboxer.c:
500   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/samples/landlock/sandboxer.c
501