Lines Matching full:we
66 * We could extend the life of a context to beyond that of all in i915_fence_get_timeline_name()
68 * or we just give them a false name. Since in i915_fence_get_timeline_name()
105 * freed when the slab cache itself is freed, and so we would get in i915_fence_release()
157 * In the future, perhaps when we have an active time-slicing scheduler, in __notify_execute_cb()
160 * quite hairy, we have to carefully rollback the fence and do a in __notify_execute_cb()
204 * engine lock. The simple ploy we use is to take the lock then in remove_from_engine()
235 * We know the GPU must have read the request to have in i915_request_retire()
240 * Note this requires that we are always called in request in i915_request_retire()
252 * As the ->retire() may free the node, we decouple it first and in i915_request_retire()
259 * we may spend an inordinate amount of time simply handling in i915_request_retire()
263 * i915_active_request. So we try to keep this loop as in i915_request_retire()
278 * We only loosely track inflight requests across preemption, in i915_request_retire()
279 * and so we may find ourselves attempting to retire a _completed_ in i915_request_retire()
280 * request that we have removed from the HW and put back on a run in i915_request_retire()
394 * With the advent of preempt-to-busy, we frequently encounter in __i915_request_submit()
395 * requests that we have unsubmitted from HW, but left running in __i915_request_submit()
397 * resubmission of that completed request, we can skip in __i915_request_submit()
401 * We must remove the request from the caller's priority queue, in __i915_request_submit()
404 * request has *not* yet been retired and we can safely move in __i915_request_submit()
416 * Are we using semaphores when the gpu is already saturated? in __i915_request_submit()
424 * If we installed a semaphore on this request and we only submit in __i915_request_submit()
427 * increases the amount of work we are doing. If so, we disable in __i915_request_submit()
428 * further use of semaphores until we are idle again, whence we in __i915_request_submit()
442 xfer: /* We may be recursing from the signal callback of another i915 fence */ in __i915_request_submit()
490 /* We may be recursing from the signal callback of another i915 fence */ in __i915_request_unsubmit()
501 /* We've already spun, don't charge on resubmitting. */ in __i915_request_unsubmit()
508 * We don't need to wake_up any waiters on request->execute, they in __i915_request_unsubmit()
543 * We need to serialize use of the submit_request() callback in submit_notify()
545 * i915_gem_set_wedged(). We use the RCU mechanism to mark the in submit_notify()
615 /* Retire our old requests in the hope that we free some */ in request_alloc_slow()
638 * We use RCU to look up requests in flight. The lookups may in __i915_request_create()
640 * That is the request we are writing to here, may be in the process in __i915_request_create()
642 * we have to be very careful when overwriting the contents. During in __i915_request_create()
643 * the RCU lookup, we change chase the request->engine pointer, in __i915_request_create()
649 * with dma_fence_init(). This increment is safe for release as we in __i915_request_create()
650 * check that the request we have a reference to and matches the active in __i915_request_create()
653 * Before we increment the refcount, we chase the request->engine in __i915_request_create()
654 * pointer. We must not call kmem_cache_zalloc() or else we set in __i915_request_create()
656 * we see the request is completed (based on the value of the in __i915_request_create()
658 * If we decide the request is not completed (new engine or seqno), in __i915_request_create()
659 * then we grab a reference and double check that it is still the in __i915_request_create()
692 /* We bump the ref for the fence chain */ in __i915_request_create()
698 /* No zalloc, must clear what we need by hand */ in __i915_request_create()
715 * Note that due to how we add reserved_space to intel_ring_begin() in __i915_request_create()
716 * we need to double our request to ensure that if we need to wrap in __i915_request_create()
725 * should we detect the updated seqno part-way through the in __i915_request_create()
726 * GPU processing the request, we never over-estimate the in __i915_request_create()
743 /* Make sure we didn't add ourselves to external state before freeing */ in __i915_request_create()
776 /* Check that we do not interrupt ourselves with a new request */ in i915_request_create()
806 * both the GPU and CPU. We want to limit the impact on others, in already_busywaiting()
808 * latency. Therefore we restrict ourselves to not using more in already_busywaiting()
810 * if we have detected the engine is saturated (i.e. would not be in already_busywaiting()
814 * See the are-we-too-late? check in __i915_request_submit(). in already_busywaiting()
831 /* Just emit the first semaphore we see as request space is limited. */ in emit_semaphore_wait()
846 /* We need to pin the signaler's HWSP until we are finished reading. */ in emit_semaphore_wait()
856 * Using greater-than-or-equal here means we have to worry in emit_semaphore_wait()
857 * about seqno wraparound. To side step that issue, we swap in emit_semaphore_wait()
929 * we should *not* decompose it into its individual fences. However, in i915_request_await_dma_fence()
930 * we don't currently store which mode the fence-array is operating in i915_request_await_dma_fence()
932 * amdgpu and we should not see any incoming fence-array from in i915_request_await_dma_fence()
1004 * We don't squash repeated fence dependencies here as we in i915_request_await_execution()
1026 * @to: request we are wishing to use
1031 * Conceptually we serialise writes between engines inside the GPU.
1032 * We only allow one engine to write into a buffer at any time, but
1033 * multiple readers. To ensure each has a coherent view of memory, we must:
1039 * - If we are a write request (pending_write_domain is set), the new
1100 * breadcrumb at the end (so we get the fence notifications). in i915_request_skip()
1119 * is special cased so that we can eliminate redundant ordering in __i915_request_add_to_timeline()
1120 * operations while building the request (we know that the timeline in __i915_request_add_to_timeline()
1121 * itself is ordered, and here we guarantee it). in __i915_request_add_to_timeline()
1123 * As we know we will need to emit tracking along the timeline, in __i915_request_add_to_timeline()
1124 * we embed the hooks into our request struct -- at the cost of in __i915_request_add_to_timeline()
1129 * that we can apply a slight variant of the rules specialised in __i915_request_add_to_timeline()
1131 * If we consider the case of virtual engine, we must emit a dma-fence in __i915_request_add_to_timeline()
1193 * should we detect the updated seqno part-way through the in __i915_request_commit()
1194 * GPU processing the request, we never over-estimate the in __i915_request_commit()
1210 * request - i.e. we may want to preempt the current request in order in __i915_request_queue()
1211 * to run a high priority dependency chain *before* we can execute this in __i915_request_queue()
1214 * This is called before the request is ready to run so that we can in __i915_request_queue()
1240 * With semaphores we spin on one engine waiting for another, in i915_request_add()
1243 * work that we could be doing on this engine instead, that in i915_request_add()
1246 * far in the distance past over useful work, we keep a history in i915_request_add()
1266 * In typical scenarios, we do not expect the previous request on in i915_request_add()
1270 * suggesting that we haven't been retiring frequently enough from in i915_request_add()
1274 * up to this client. Since we have now moved the heaviest operations in i915_request_add()
1277 * (and cache misses), and so we should not be overly penalizing this in i915_request_add()
1278 * client by performing excess work, though we may still performing in i915_request_add()
1279 * work on behalf of others -- but instead we should benefit from in i915_request_add()
1299 * the comparisons are no longer valid if we switch CPUs. Instead of in local_clock_us()
1300 * blocking preemption for the entire busywait, we can detect the CPU in local_clock_us()
1327 * Only wait for the request if we know it is likely to complete. in __i915_spin_request()
1329 * We don't track the timestamps around requests, nor the average in __i915_spin_request()
1330 * request length, so we do not have a good indicator that this in __i915_spin_request()
1331 * request will complete within the timeout. What we do know is the in __i915_spin_request()
1332 * order in which requests are executed by the context and so we can in __i915_spin_request()
1344 * rate. By busywaiting on the request completion for a short while we in __i915_spin_request()
1346 * if it is a slow request, we want to sleep as quickly as possible. in __i915_spin_request()
1415 * We must never wait on the GPU while holding a lock as we in i915_request_wait()
1416 * may need to perform a GPU reset. So while we don't need to in i915_request_wait()
1417 * serialise wait/reset with an explicit lock, we do want in i915_request_wait()
1425 * We may use a rather large value here to offset the penalty of in i915_request_wait()
1432 * short wait, we first spin to see if the request would have completed in i915_request_wait()
1435 * We need upto 5us to enable the irq, and upto 20us to hide the in i915_request_wait()
1443 * duration, which we currently lack. in i915_request_wait()
1457 * We can circumvent that by promoting the GPU frequency to maximum in i915_request_wait()
1458 * before we sleep. This makes the GPU throttle up much more quickly in i915_request_wait()