Lines Matching +full:required +full:- +full:for +full:- +full:hardware +full:- +full:jobs

1 Buffer Sharing and Synchronization (dma-buf)
4 The dma-buf subsystem provides the framework for sharing buffers for
5 hardware (DMA) access across multiple device drivers and subsystems, and
6 for synchronizing asynchronous hardware access.
8 This is used, for example, by drm "prime" multi-GPU support, but is of
11 The three main components of this are: (1) dma-buf, representing a
18 ------------------
20 This document serves as a guide to device-driver writers on what is the dma-buf
21 buffer sharing API, how to use it for exporting and using shared buffers.
27 exporter, and A as buffer-user/importer.
31 - implements and manages operations in :c:type:`struct dma_buf_ops
32 <dma_buf_ops>` for the buffer,
33 - allows other users to share the buffer by using dma_buf sharing APIs,
34 - manages the details of buffer allocation, wrapped in a :c:type:`struct
36 - decides about the actual backing storage where this allocation happens,
37 - and takes care of any migration of scatterlist - for all (shared) users of
40 The buffer-user
42 - is one of (many) sharing users of the buffer.
43 - doesn't need to worry about how the buffer is allocated, or where.
44 - and needs a mechanism to get access to the scatterlist that makes up this
49 Any exporters or users of the dma-buf buffer sharing framework must have a
55 Mostly a DMA buffer file descriptor is simply an opaque object for userspace,
59 - Since kernel 3.12 the dma-buf FD supports the llseek system call, but only
62 llseek operation will report -EINVAL.
64 If llseek on dma-buf FDs isn't support the kernel will report -ESPIPE for all
65 cases. Userspace can use this to detect support for discovering the dma-buf
68 - In order to avoid fd leaks on exec, the FD_CLOEXEC flag must be set
76 multi-threaded app[3]. The issue is made worse when it is library code
81 flag be set when the dma-buf fd is created. So any API provided by
85 - Memory mapping the contents of the DMA buffer is also supported. See the
86 discussion below on `CPU Access to DMA Buffer Objects`_ for the full details.
88 - The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below for
91 - The DMA buffer FD also supports a few dma-buf-specific ioctls, see
92 `DMA Buffer ioctls`_ below for details.
97 .. kernel-doc:: drivers/dma-buf/dma-buf.c
103 .. kernel-doc:: drivers/dma-buf/dma-buf.c
109 .. kernel-doc:: drivers/dma-buf/dma-buf.c
112 DMA-BUF statistics
114 .. kernel-doc:: drivers/dma-buf/dma-buf-sysfs-stats.c
120 .. kernel-doc:: include/uapi/linux/dma-buf.h
122 DMA-BUF locking convention
125 .. kernel-doc:: drivers/dma-buf/dma-buf.c
131 .. kernel-doc:: drivers/dma-buf/dma-buf.c
134 .. kernel-doc:: include/linux/dma-buf.h
138 -------------------
140 .. kernel-doc:: drivers/dma-buf/dma-resv.c
143 .. kernel-doc:: drivers/dma-buf/dma-resv.c
146 .. kernel-doc:: include/linux/dma-resv.h
150 ----------
152 .. kernel-doc:: drivers/dma-buf/dma-fence.c
155 DMA Fence Cross-Driver Contract
158 .. kernel-doc:: drivers/dma-buf/dma-fence.c
159 :doc: fence cross-driver contract
164 .. kernel-doc:: drivers/dma-buf/dma-fence.c
170 .. kernel-doc:: drivers/dma-buf/dma-fence.c
176 .. kernel-doc:: drivers/dma-buf/dma-fence.c
179 .. kernel-doc:: include/linux/dma-fence.h
185 .. kernel-doc:: drivers/dma-buf/dma-fence-array.c
188 .. kernel-doc:: include/linux/dma-fence-array.h
194 .. kernel-doc:: drivers/dma-buf/dma-fence-chain.c
197 .. kernel-doc:: include/linux/dma-fence-chain.h
203 .. kernel-doc:: include/linux/dma-fence-unwrap.h
209 .. kernel-doc:: drivers/dma-buf/sync_file.c
212 .. kernel-doc:: include/linux/sync_file.h
218 .. kernel-doc:: include/uapi/linux/sync_file.h
231 * Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet
234 * Userspace fences or gpu futexes, fine-grained locking within a command buffer
235 that userspace uses for synchronization across engines or with the CPU, which
236 are then imported as a DMA fence for integration into existing winsys
239 * Long-running compute command buffers, while still using traditional end of
240 batch DMA fences for memory management instead of context preemption DMA
245 in-kernel DMA fences does not work, even when a fallback timeout is included to
255 for memory management needs, which means we must support indefinite fences being
258 potential for deadlocks.
260 .. kernel-render:: DOT
268 kernel -> userspace [label="memory management"]
269 userspace -> kernel [label="Future fence, fence proxy, ..."]
287 * No DMA fences that signal end of batchbuffer for command submission where
289 workloads. This also means no implicit fencing for shared buffers in these
292 Recoverable Hardware Page Faults Implications
295 Modern hardware supports recoverable page faults, which has a lot of
296 implications for DMA fences.
299 accelerator and a memory allocation is usually required to resolve the fault.
301 means any workload using recoverable page faults cannot use DMA fences for
310 on-demand fill a memory request. For now this means recoverable page
318 - The 3D workload might need to wait for the compute job to finish and release
319 hardware resources first.
321 - The compute workload might be stuck in a page fault, because the memory
322 allocation is waiting for the DMA fence of the 3D workload to complete.
327 - Compute workloads can always be preempted, even when a page fault is pending
328 and not yet repaired. Not all hardware supports this.
330 - DMA fence workloads and workloads which need page fault handling have
331 independent hardware resources to guarantee forward progress. This could be
333 reservations for DMA fence workloads.
335 - The reservation approach could be further refined by only reserving the
336 hardware resources for DMA fence workloads when they are in-flight. This must
340 - As a last resort, if the hardware provides no useful reservation mechanics,
341 all workloads must be flushed from the GPU when switching between jobs
342 requiring DMA fences or jobs requiring page fault handling: This means all DMA
348 - Only a fairly theoretical option would be to untangle these dependencies when
349 allocating memory to repair hardware page faults, either through separate
353 robust to limit the impact of handling hardware page faults to the specific
356 Note that workloads that run on independent hardware like copy engines or other
358 in the kernel even for resolving hardware page faults, e.g. by using copy
365 hit a page fault which holds up a userspace fence - supporting page faults on