benchmarks/latency_measure/README.rst

Latency Measurements
####################

This benchmark measures the average latency of selected kernel capabilities,
including:

* Context switch time between preemptive threads using k_yield
* Context switch time between cooperative threads using k_yield
* Time to switch from ISR back to interrupted thread
* Time from ISR to executing a different thread (rescheduled)
* Time to signal a semaphore then test that semaphore
* Time to signal a semaphore then test that semaphore with a context switch
* Times to lock a mutex then unlock that mutex
* Time it takes to create a new thread (without starting it)
* Time it takes to start a newly created thread
* Time it takes to suspend a thread
* Time it takes to resume a suspended thread
* Time it takes to abort a thread
* Time it takes to add data to a fifo.LIFO
* Time it takes to retrieve data from a fifo.LIFO
* Time it takes to wait on a fifo.lifo.(and context switch)
* Time it takes to wake and switch to a thread waiting on a fifo.LIFO
* Time it takes to send and receive events
* Time it takes to wait for events (and context switch)
* Time it takes to wake and switch to a thread waiting for events
* Time it takes to push and pop to/from a k_stack
* Measure average time to alloc memory from heap then free that memory

When userspace is enabled, this benchmark will where possible, also test the
above capabilities using various configurations involving user threads:

* Kernel thread to kernel thread
* Kernel thread to user thread
* User thread to kernel thread
* User thread to user thread

The default configuration builds only for the kernel. However, additional
configurations can be enabled via the use of EXTRA_CONF_FILE.

For example, the following will build this project with userspace support:

    EXTRA_CONF_FILE="prj.userspace.conf" west build -p -b <board> <path to project>

The following table summarizes the purposes of the different extra
configuration files that are available to be used with this benchmark.
A tester may mix and match them allowing them different scenarios to
be easily compared the default. Benchmark output can be saved and subsequently
exported to third party tools to compare and chart performance differences
both between configurations as well as across Zephyr versions.

+-----------------------------+------------------------------------+
| prj.canaries.conf           | Enable stack canaries              |
+-----------------------------+------------------------------------+
| prj.objcore.conf            | Enable object cores and statistics |
+-----------------------------+------------------------------------+
| prj.userspace.conf          | Enable userspace support           |
+-----------------------------+------------------------------------+

Sample output of the benchmark using the defaults::

        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     315 cycles ,     2625 ns :
        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     315 cycles ,     2625 ns :
        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     289 cycles ,     2416 ns :
        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :     374 cycles ,     3124 ns :
        thread.create.kernel.from.kernel         - Create thread                                      :     382 cycles ,     3191 ns :
        thread.start.kernel.from.kernel          - Start thread                                       :     394 cycles ,     3291 ns :
        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     289 cycles ,     2416 ns :
        thread.resume.kernel.from.kernel         - Resume thread                                      :     339 cycles ,     2833 ns :
        thread.abort.kernel.from.kernel          - Abort thread                                       :     339 cycles ,     2833 ns :
        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     214 cycles ,     1791 ns :
        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     134 cycles ,     1124 ns :
        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :     834 cycles ,     6950 ns :
        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :     560 cycles ,     4666 ns :
        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :     510 cycles ,     4257 ns :
        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :     590 cycles ,     4923 ns :
        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :     510 cycles ,     4250 ns :
        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :     585 cycles ,     4875 ns :
        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     214 cycles ,     1791 ns :
        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     120 cycles ,     1008 ns :
        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :     831 cycles ,     6925 ns :
        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :     555 cycles ,     4625 ns :
        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :     502 cycles ,     4191 ns :
        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :     585 cycles ,     4875 ns :
        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :     513 cycles ,     4275 ns :
        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :     585 cycles ,     4881 ns :
        events.post.immediate.kernel             - Post events (nothing wakes)                        :     225 cycles ,     1875 ns :
        events.set.immediate.kernel              - Set events (nothing wakes)                         :     230 cycles ,     1923 ns :
        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     120 cycles ,     1000 ns :
        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     110 cycles ,      917 ns :
        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :     514 cycles ,     4291 ns :
        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :     754 cycles ,     6291 ns :
        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :     528 cycles ,     4400 ns :
        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :     765 cycles ,     6375 ns :
        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :      59 cycles ,      492 ns :
        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :      69 cycles ,      575 ns :
        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :     450 cycles ,     3756 ns :
        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :     509 cycles ,     4249 ns :
        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :     578 cycles ,     4817 ns :
        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :     630 cycles ,     5250 ns :
        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :     107 cycles ,      899 ns :
        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :      80 cycles ,      674 ns :
        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :     467 cycles ,     3899 ns :
        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :     550 cycles ,     4583 ns :
        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :      83 cycles ,      692 ns :
        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      44 cycles ,      367 ns :
        heap.malloc.immediate                    - Average time for heap malloc                       :     610 cycles ,     5083 ns :
        heap.free.immediate                      - Average time for heap free                         :     425 cycles ,     3541 ns :
        ===================================================================
        PROJECT EXECUTION SUCCESSFUL


Sample output of the benchmark with stack canaries enabled::

        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     485 cycles ,     4042 ns :
        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     485 cycles ,     4042 ns :
        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     545 cycles ,     4549 ns :
        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :     590 cycles ,     4924 ns :
        thread.create.kernel.from.kernel         - Create thread                                      :     585 cycles ,     4883 ns :
        thread.start.kernel.from.kernel          - Start thread                                       :     685 cycles ,     5716 ns :
        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     490 cycles ,     4091 ns :
        thread.resume.kernel.from.kernel         - Resume thread                                      :     569 cycles ,     4749 ns :
        thread.abort.kernel.from.kernel          - Abort thread                                       :     629 cycles ,     5249 ns :
        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     439 cycles ,     3666 ns :
        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     320 cycles ,     2674 ns :
        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :    1499 cycles ,    12491 ns :
        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :    1230 cycles ,    10250 ns :
        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :     868 cycles ,     7241 ns :
        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :     991 cycles ,     8259 ns :
        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :     879 cycles ,     7325 ns :
        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :     990 cycles ,     8250 ns :
        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     429 cycles ,     3582 ns :
        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     320 cycles ,     2674 ns :
        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :    1499 cycles ,    12491 ns :
        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :    1220 cycles ,    10166 ns :
        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :     863 cycles ,     7199 ns :
        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :     985 cycles ,     8208 ns :
        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :     879 cycles ,     7325 ns :
        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :     985 cycles ,     8208 ns :
        events.post.immediate.kernel             - Post events (nothing wakes)                        :     420 cycles ,     3501 ns :
        events.set.immediate.kernel              - Set events (nothing wakes)                         :     420 cycles ,     3501 ns :
        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     280 cycles ,     2334 ns :
        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     270 cycles ,     2251 ns :
        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :     919 cycles ,     7665 ns :
        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :    1310 cycles ,    10924 ns :
        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :     954 cycles ,     7950 ns :
        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :    1340 cycles ,    11166 ns :
        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :     110 cycles ,      917 ns :
        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :     180 cycles ,     1500 ns :
        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :     755 cycles ,     6292 ns :
        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :     812 cycles ,     6767 ns :
        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :    1027 cycles ,     8558 ns :
        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :    1040 cycles ,     8666 ns :
        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :     220 cycles ,     1841 ns :
        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :     205 cycles ,     1716 ns :
        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :     791 cycles ,     6599 ns :
        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :     870 cycles ,     7250 ns :
        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :     175 cycles ,     1459 ns :
        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      61 cycles ,      510 ns :
        heap.malloc.immediate                    - Average time for heap malloc                       :    1060 cycles ,     8833 ns :
        heap.free.immediate                      - Average time for heap free                         :     899 cycles ,     7491 ns :
        ===================================================================
        PROJECT EXECUTION SUCCESSFUL

The sample output above (stack canaries are enabled) shows longer times than
for the default build. Not only does each stack frame in the call tree have
its own stack canary check, but enabling this feature impacts how the compiler
chooses to inline or not inline routines.

Sample output of the benchmark with object core enabled::

        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     740 cycles ,     6167 ns :
        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     740 cycles ,     6167 ns :
        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     284 cycles ,     2374 ns :
        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :     784 cycles ,     6541 ns :
        thread.create.kernel.from.kernel         - Create thread                                      :     714 cycles ,     5958 ns :
        thread.start.kernel.from.kernel          - Start thread                                       :     819 cycles ,     6833 ns :
        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     704 cycles ,     5874 ns :
        thread.resume.kernel.from.kernel         - Resume thread                                      :     761 cycles ,     6349 ns :
        thread.abort.kernel.from.kernel          - Abort thread                                       :     544 cycles ,     4541 ns :
        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     211 cycles ,     1766 ns :
        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     132 cycles ,     1108 ns :
        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :     850 cycles ,     7091 ns :
        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :     565 cycles ,     4708 ns :
        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :     947 cycles ,     7899 ns :
        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :    1015 cycles ,     8458 ns :
        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :     950 cycles ,     7923 ns :
        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :    1010 cycles ,     8416 ns :
        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     226 cycles ,     1891 ns :
        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     123 cycles ,     1033 ns :
        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :     848 cycles ,     7066 ns :
        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :     565 cycles ,     4708 ns :
        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :     951 cycles ,     7932 ns :
        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :    1010 cycles ,     8416 ns :
        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :     959 cycles ,     7991 ns :
        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1010 cycles ,     8422 ns :
        events.post.immediate.kernel             - Post events (nothing wakes)                        :     210 cycles ,     1750 ns :
        events.set.immediate.kernel              - Set events (nothing wakes)                         :     230 cycles ,     1917 ns :
        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     120 cycles ,     1000 ns :
        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     150 cycles ,     1250 ns :
        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :     951 cycles ,     7932 ns :
        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :    1179 cycles ,     9833 ns :
        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :     976 cycles ,     8133 ns :
        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :    1190 cycles ,     9922 ns :
        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :      59 cycles ,      492 ns :
        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :      69 cycles ,      575 ns :
        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :     870 cycles ,     7250 ns :
        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :     929 cycles ,     7749 ns :
        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :    1010 cycles ,     8417 ns :
        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :    1060 cycles ,     8833 ns :
        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :      90 cycles ,      758 ns :
        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :      86 cycles ,      724 ns :
        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :     910 cycles ,     7589 ns :
        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :     975 cycles ,     8125 ns :
        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :     105 cycles ,      875 ns :
        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      44 cycles ,      367 ns :
        heap.malloc.immediate                    - Average time for heap malloc                       :     621 cycles ,     5183 ns :
        heap.free.immediate                      - Average time for heap free                         :     422 cycles ,     3516 ns :
        ===================================================================
        PROJECT EXECUTION SUCCESSFUL

The sample output above (object core and statistics enabled) shows longer
times than for the default build when context switching is involved. A blanket
enabling of the object cores as was done here results in the additional
gathering of thread statistics when a thread is switched in/out. The
gathering of these statistics can be controlled at both at the time of
project configuration as well as at runtime.

Sample output of the benchmark with userspace enabled::

        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     975 cycles ,     8125 ns :
        thread.yield.preemptive.ctx.u_to_u       - Context switch via k_yield                         :    1303 cycles ,    10860 ns :
        thread.yield.preemptive.ctx.k_to_u       - Context switch via k_yield                         :    1180 cycles ,     9834 ns :
        thread.yield.preemptive.ctx.u_to_k       - Context switch via k_yield                         :    1097 cycles ,     9144 ns :
        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     975 cycles ,     8125 ns :
        thread.yield.cooperative.ctx.u_to_u      - Context switch via k_yield                         :    1302 cycles ,    10854 ns :
        thread.yield.cooperative.ctx.k_to_u      - Context switch via k_yield                         :    1180 cycles ,     9834 ns :
        thread.yield.cooperative.ctx.u_to_k      - Context switch via k_yield                         :    1097 cycles ,     9144 ns :
        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     329 cycles ,     2749 ns :
        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :    1014 cycles ,     8457 ns :
        isr.resume.different.thread.user         - Return from ISR to another thread                  :    1223 cycles ,    10192 ns :
        thread.create.kernel.from.kernel         - Create thread                                      :     970 cycles ,     8089 ns :
        thread.start.kernel.from.kernel          - Start thread                                       :    1074 cycles ,     8957 ns :
        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     949 cycles ,     7916 ns :
        thread.resume.kernel.from.kernel         - Resume thread                                      :    1004 cycles ,     8374 ns :
        thread.abort.kernel.from.kernel          - Abort thread                                       :    2734 cycles ,    22791 ns :
        thread.create.user.from.kernel           - Create thread                                      :     832 cycles ,     6935 ns :
        thread.start.user.from.kernel            - Start thread                                       :    9023 cycles ,    75192 ns :
        thread.suspend.user.from.kernel          - Suspend thread                                     :    1312 cycles ,    10935 ns :
        thread.resume.user.from.kernel           - Resume thread                                      :    1187 cycles ,     9894 ns :
        thread.abort.user.from.kernel            - Abort thread                                       :    2597 cycles ,    21644 ns :
        thread.create.user.from.user             - Create thread                                      :    2144 cycles ,    17872 ns :
        thread.start.user.from.user              - Start thread                                       :    9399 cycles ,    78330 ns :
        thread.suspend.user.from.user            - Suspend thread                                     :    1504 cycles ,    12539 ns :
        thread.resume.user.from.user             - Resume thread                                      :    1574 cycles ,    13122 ns :
        thread.abort.user.from.user              - Abort thread                                       :    3237 cycles ,    26981 ns :
        thread.start.kernel.from.user            - Start thread                                       :    1452 cycles ,    12102 ns :
        thread.suspend.kernel.from.user          - Suspend thread                                     :    1143 cycles ,     9525 ns :
        thread.resume.kernel.from.user           - Resume thread                                      :    1392 cycles ,    11602 ns :
        thread.abort.kernel.from.user            - Abort thread                                       :    3372 cycles ,    28102 ns :
        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     239 cycles ,     1999 ns :
        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     184 cycles ,     1541 ns :
        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :     920 cycles ,     7666 ns :
        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :     650 cycles ,     5416 ns :
        fifo.put.alloc.immediate.user            - Allocate to add data to FIFO (no ctx switch)       :    1710 cycles ,    14256 ns :
        fifo.get.free.immediate.user             - Free when getting data from FIFO (no ctx switch)   :    1440 cycles ,    12000 ns :
        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :    1209 cycles ,    10082 ns :
        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :    1230 cycles ,    10250 ns :
        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :    1210 cycles ,    10083 ns :
        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :    1260 cycles ,    10500 ns :
        fifo.get.free.blocking.u_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :    1745 cycles ,    14547 ns :
        fifo.put.alloc.wake+ctx.k_to_u           - Allocate to add data to FIFO (w/ ctx switch)       :    1600 cycles ,    13333 ns :
        fifo.get.free.blocking.k_to_u            - Free when getting data from FIFO (w/ ctx siwtch)   :    1550 cycles ,    12922 ns :
        fifo.put.alloc.wake+ctx.u_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :    1795 cycles ,    14958 ns :
        fifo.get.free.blocking.u_to_u            - Free when getting data from FIFO (w/ ctx siwtch)   :    2084 cycles ,    17374 ns :
        fifo.put.alloc.wake+ctx.u_to_u           - Allocate to add data to FIFO (w/ ctx switch)       :    2135 cycles ,    17791 ns :
        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     234 cycles ,     1957 ns :
        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     189 cycles ,     1582 ns :
        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :     935 cycles ,     7791 ns :
        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :     650 cycles ,     5416 ns :
        lifo.put.alloc.immediate.user            - Allocate to add data to LIFO (no ctx switch)       :    1715 cycles ,    14291 ns :
        lifo.get.free.immediate.user             - Free when getting data from LIFO (no ctx switch)   :    1445 cycles ,    12041 ns :
        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :    1219 cycles ,    10166 ns :
        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :    1230 cycles ,    10250 ns :
        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :    1210 cycles ,    10083 ns :
        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1260 cycles ,    10500 ns :
        lifo.get.free.blocking.u_to_k            - Free when getting data from LIFO (w/ ctx switch)   :    1744 cycles ,    14541 ns :
        lifo.put.alloc.wake+ctx.k_to_u           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1595 cycles ,    13291 ns :
        lifo.get.free.blocking.k_to_u            - Free when getting data from LIFO (w/ ctx switch)   :    1544 cycles ,    12874 ns :
        lifo.put.alloc.wake+ctx.u_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1795 cycles ,    14958 ns :
        lifo.get.free.blocking.u_to_u            - Free when getting data from LIFO (w/ ctx switch)   :    2080 cycles ,    17339 ns :
        lifo.put.alloc.wake+ctx.u_to_u           - Allocate to add data to LIFO (w/ ctx siwtch)       :    2130 cycles ,    17750 ns :
        events.post.immediate.kernel             - Post events (nothing wakes)                        :     285 cycles ,     2375 ns :
        events.set.immediate.kernel              - Set events (nothing wakes)                         :     290 cycles ,     2417 ns :
        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     235 cycles ,     1958 ns :
        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     245 cycles ,     2042 ns :
        events.post.immediate.user               - Post events (nothing wakes)                        :     800 cycles ,     6669 ns :
        events.set.immediate.user                - Set events (nothing wakes)                         :     811 cycles ,     6759 ns :
        events.wait.immediate.user               - Wait for any events (no ctx switch)                :     780 cycles ,     6502 ns :
        events.wait_all.immediate.user           - Wait for all events (no ctx switch)                :     770 cycles ,     6419 ns :
        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :    1210 cycles ,    10089 ns :
        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :    1449 cycles ,    12082 ns :
        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :    1250 cycles ,    10416 ns :
        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :    1475 cycles ,    12291 ns :
        events.wait.blocking.u_to_k              - Wait for any events (w/ ctx switch)                :    1612 cycles ,    13435 ns :
        events.set.wake+ctx.k_to_u               - Set events (w/ ctx switch)                         :    1627 cycles ,    13560 ns :
        events.wait_all.blocking.u_to_k          - Wait for all events (w/ ctx switch)                :    1785 cycles ,    14875 ns :
        events.post.wake+ctx.k_to_u              - Post events (w/ ctx switch)                        :    1790 cycles ,    14923 ns :
        events.wait.blocking.k_to_u              - Wait for any events (w/ ctx switch)                :    1407 cycles ,    11727 ns :
        events.set.wake+ctx.u_to_k               - Set events (w/ ctx switch)                         :    1828 cycles ,    15234 ns :
        events.wait_all.blocking.k_to_u          - Wait for all events (w/ ctx switch)                :    1585 cycles ,    13208 ns :
        events.post.wake+ctx.u_to_k              - Post events (w/ ctx switch)                        :    2000 cycles ,    16666 ns :
        events.wait.blocking.u_to_u              - Wait for any events (w/ ctx switch)                :    1810 cycles ,    15087 ns :
        events.set.wake+ctx.u_to_u               - Set events (w/ ctx switch)                         :    2004 cycles ,    16705 ns :
        events.wait_all.blocking.u_to_u          - Wait for all events (w/ ctx switch)                :    2120 cycles ,    17666 ns :
        events.post.wake+ctx.u_to_u              - Post events (w/ ctx switch)                        :    2315 cycles ,    19291 ns :
        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :     125 cycles ,     1042 ns :
        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :     125 cycles ,     1042 ns :
        semaphore.give.immediate.user            - Give a semaphore (no waiters)                      :     645 cycles ,     5377 ns :
        semaphore.take.immediate.user            - Take a semaphore (no blocking)                     :     680 cycles ,     5669 ns :
        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :    1140 cycles ,     9500 ns :
        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :    1174 cycles ,     9791 ns :
        semaphore.take.blocking.k_to_u           - Take a semaphore (context switch)                  :    1350 cycles ,    11251 ns :
        semaphore.give.wake+ctx.u_to_k           - Give a semaphore (context switch)                  :    1542 cycles ,    12852 ns :
        semaphore.take.blocking.u_to_k           - Take a semaphore (context switch)                  :    1512 cycles ,    12603 ns :
        semaphore.give.wake+ctx.k_to_u           - Give a semaphore (context switch)                  :    1382 cycles ,    11519 ns :
        semaphore.take.blocking.u_to_u           - Take a semaphore (context switch)                  :    1723 cycles ,    14360 ns :
        semaphore.give.wake+ctx.u_to_u           - Give a semaphore (context switch)                  :    1749 cycles ,    14580 ns :
        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :    1285 cycles ,    10708 ns :
        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :    1315 cycles ,    10964 ns :
        condvar.wait.blocking.k_to_u             - Wait for a condvar (context switch)                :    1547 cycles ,    12898 ns :
        condvar.signal.wake+ctx.u_to_k           - Signal a condvar (context switch)                  :    1855 cycles ,    15458 ns :
        condvar.wait.blocking.u_to_k             - Wait for a condvar (context switch)                :    1990 cycles ,    16583 ns :
        condvar.signal.wake+ctx.k_to_u           - Signal a condvar (context switch)                  :    1640 cycles ,    13666 ns :
        condvar.wait.blocking.u_to_u             - Wait for a condvar (context switch)                :    2313 cycles ,    19280 ns :
        condvar.signal.wake+ctx.u_to_u           - Signal a condvar (context switch)                  :    2170 cycles ,    18083 ns :
        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :     189 cycles ,     1582 ns :
        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :     194 cycles ,     1624 ns :
        stack.push.immediate.user                - Add data to k_stack (no ctx switch)                :     679 cycles ,     5664 ns :
        stack.pop.immediate.user                 - Get data from k_stack (no ctx switch)              :    1014 cycles ,     8455 ns :
        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :    1209 cycles ,    10083 ns :
        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :    1235 cycles ,    10291 ns :
        stack.pop.blocking.u_to_k                - Get data from k_stack (w/ ctx switch)              :    2050 cycles ,    17089 ns :
        stack.push.wake+ctx.k_to_u               - Add data to k_stack (w/ ctx switch)                :    1575 cycles ,    13125 ns :
        stack.pop.blocking.k_to_u                - Get data from k_stack (w/ ctx switch)              :    1549 cycles ,    12916 ns :
        stack.push.wake+ctx.u_to_k               - Add data to k_stack (w/ ctx switch)                :    1755 cycles ,    14625 ns :
        stack.pop.blocking.u_to_u                - Get data from k_stack (w/ ctx switch)              :    2389 cycles ,    19916 ns :
        stack.push.wake+ctx.u_to_u               - Add data to k_stack (w/ ctx switch)                :    2095 cycles ,    17458 ns :
        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :     165 cycles ,     1375 ns :
        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      80 cycles ,      668 ns :
        mutex.lock.immediate.recursive.user      - Lock a mutex                                       :     685 cycles ,     5711 ns :
        mutex.unlock.immediate.recursive.user    - Unlock a mutex                                     :     615 cycles ,     5128 ns :
        heap.malloc.immediate                    - Average time for heap malloc                       :     626 cycles ,     5224 ns :
        heap.free.immediate                      - Average time for heap free                         :     427 cycles ,     3558 ns :
        ===================================================================
        PROJECT EXECUTION SUCCESSFUL

The sample output above (userspace enabled) shows longer times than for
the default build scenario. Enabling userspace results in additional
runtime overhead on each call to a kernel object to determine whether the
caller is in user or kernel space and consequently whether a system call
is needed or not.