1Latency Measurements
2####################
3
4This benchmark measures the average latency of selected kernel capabilities,
5including:
6
7* Context switch time between preemptive threads using k_yield
8* Context switch time between cooperative threads using k_yield
9* Time to switch from ISR back to interrupted thread
10* Time from ISR to executing a different thread (rescheduled)
11* Time to signal a semaphore then test that semaphore
12* Time to signal a semaphore then test that semaphore with a context switch
13* Times to lock a mutex then unlock that mutex
14* Time it takes to create a new thread (without starting it)
15* Time it takes to start a newly created thread
16* Time it takes to suspend a thread
17* Time it takes to resume a suspended thread
18* Time it takes to abort a thread
19* Time it takes to add data to a fifo.LIFO
20* Time it takes to retrieve data from a fifo.LIFO
21* Time it takes to wait on a fifo.lifo.(and context switch)
22* Time it takes to wake and switch to a thread waiting on a fifo.LIFO
23* Time it takes to send and receive events
24* Time it takes to wait for events (and context switch)
25* Time it takes to wake and switch to a thread waiting for events
26* Time it takes to push and pop to/from a k_stack
27* Measure average time to alloc memory from heap then free that memory
28
29When userspace is enabled, this benchmark will where possible, also test the
30above capabilities using various configurations involving user threads:
31
32* Kernel thread to kernel thread
33* Kernel thread to user thread
34* User thread to kernel thread
35* User thread to user thread
36
37The default configuration builds only for the kernel. However, additional
38configurations can be enabled via the use of EXTRA_CONF_FILE.
39
40For example, the following will build this project with userspace support:
41
42    EXTRA_CONF_FILE="prj.userspace.conf" west build -p -b <board> <path to project>
43
44The following table summarizes the purposes of the different extra
45configuration files that are available to be used with this benchmark.
46A tester may mix and match them allowing them different scenarios to
47be easily compared the default. Benchmark output can be saved and subsequently
48exported to third party tools to compare and chart performance differences
49both between configurations as well as across Zephyr versions.
50
51+-----------------------------+------------------------------------+
52| prj.canaries.conf           | Enable stack canaries              |
53+-----------------------------+------------------------------------+
54| prj.objcore.conf            | Enable object cores and statistics |
55+-----------------------------+------------------------------------+
56| prj.userspace.conf          | Enable userspace support           |
57+-----------------------------+------------------------------------+
58
59Sample output of the benchmark using the defaults::
60
61        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     315 cycles ,     2625 ns :
62        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     315 cycles ,     2625 ns :
63        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     289 cycles ,     2416 ns :
64        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :     374 cycles ,     3124 ns :
65        thread.create.kernel.from.kernel         - Create thread                                      :     382 cycles ,     3191 ns :
66        thread.start.kernel.from.kernel          - Start thread                                       :     394 cycles ,     3291 ns :
67        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     289 cycles ,     2416 ns :
68        thread.resume.kernel.from.kernel         - Resume thread                                      :     339 cycles ,     2833 ns :
69        thread.abort.kernel.from.kernel          - Abort thread                                       :     339 cycles ,     2833 ns :
70        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     214 cycles ,     1791 ns :
71        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     134 cycles ,     1124 ns :
72        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :     834 cycles ,     6950 ns :
73        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :     560 cycles ,     4666 ns :
74        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :     510 cycles ,     4257 ns :
75        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :     590 cycles ,     4923 ns :
76        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :     510 cycles ,     4250 ns :
77        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :     585 cycles ,     4875 ns :
78        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     214 cycles ,     1791 ns :
79        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     120 cycles ,     1008 ns :
80        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :     831 cycles ,     6925 ns :
81        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :     555 cycles ,     4625 ns :
82        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :     502 cycles ,     4191 ns :
83        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :     585 cycles ,     4875 ns :
84        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :     513 cycles ,     4275 ns :
85        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :     585 cycles ,     4881 ns :
86        events.post.immediate.kernel             - Post events (nothing wakes)                        :     225 cycles ,     1875 ns :
87        events.set.immediate.kernel              - Set events (nothing wakes)                         :     230 cycles ,     1923 ns :
88        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     120 cycles ,     1000 ns :
89        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     110 cycles ,      917 ns :
90        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :     514 cycles ,     4291 ns :
91        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :     754 cycles ,     6291 ns :
92        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :     528 cycles ,     4400 ns :
93        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :     765 cycles ,     6375 ns :
94        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :      59 cycles ,      492 ns :
95        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :      69 cycles ,      575 ns :
96        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :     450 cycles ,     3756 ns :
97        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :     509 cycles ,     4249 ns :
98        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :     578 cycles ,     4817 ns :
99        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :     630 cycles ,     5250 ns :
100        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :     107 cycles ,      899 ns :
101        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :      80 cycles ,      674 ns :
102        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :     467 cycles ,     3899 ns :
103        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :     550 cycles ,     4583 ns :
104        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :      83 cycles ,      692 ns :
105        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      44 cycles ,      367 ns :
106        heap.malloc.immediate                    - Average time for heap malloc                       :     610 cycles ,     5083 ns :
107        heap.free.immediate                      - Average time for heap free                         :     425 cycles ,     3541 ns :
108        ===================================================================
109        PROJECT EXECUTION SUCCESSFUL
110
111
112Sample output of the benchmark with stack canaries enabled::
113
114        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     485 cycles ,     4042 ns :
115        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     485 cycles ,     4042 ns :
116        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     545 cycles ,     4549 ns :
117        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :     590 cycles ,     4924 ns :
118        thread.create.kernel.from.kernel         - Create thread                                      :     585 cycles ,     4883 ns :
119        thread.start.kernel.from.kernel          - Start thread                                       :     685 cycles ,     5716 ns :
120        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     490 cycles ,     4091 ns :
121        thread.resume.kernel.from.kernel         - Resume thread                                      :     569 cycles ,     4749 ns :
122        thread.abort.kernel.from.kernel          - Abort thread                                       :     629 cycles ,     5249 ns :
123        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     439 cycles ,     3666 ns :
124        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     320 cycles ,     2674 ns :
125        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :    1499 cycles ,    12491 ns :
126        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :    1230 cycles ,    10250 ns :
127        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :     868 cycles ,     7241 ns :
128        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :     991 cycles ,     8259 ns :
129        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :     879 cycles ,     7325 ns :
130        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :     990 cycles ,     8250 ns :
131        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     429 cycles ,     3582 ns :
132        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     320 cycles ,     2674 ns :
133        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :    1499 cycles ,    12491 ns :
134        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :    1220 cycles ,    10166 ns :
135        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :     863 cycles ,     7199 ns :
136        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :     985 cycles ,     8208 ns :
137        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :     879 cycles ,     7325 ns :
138        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :     985 cycles ,     8208 ns :
139        events.post.immediate.kernel             - Post events (nothing wakes)                        :     420 cycles ,     3501 ns :
140        events.set.immediate.kernel              - Set events (nothing wakes)                         :     420 cycles ,     3501 ns :
141        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     280 cycles ,     2334 ns :
142        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     270 cycles ,     2251 ns :
143        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :     919 cycles ,     7665 ns :
144        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :    1310 cycles ,    10924 ns :
145        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :     954 cycles ,     7950 ns :
146        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :    1340 cycles ,    11166 ns :
147        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :     110 cycles ,      917 ns :
148        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :     180 cycles ,     1500 ns :
149        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :     755 cycles ,     6292 ns :
150        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :     812 cycles ,     6767 ns :
151        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :    1027 cycles ,     8558 ns :
152        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :    1040 cycles ,     8666 ns :
153        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :     220 cycles ,     1841 ns :
154        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :     205 cycles ,     1716 ns :
155        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :     791 cycles ,     6599 ns :
156        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :     870 cycles ,     7250 ns :
157        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :     175 cycles ,     1459 ns :
158        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      61 cycles ,      510 ns :
159        heap.malloc.immediate                    - Average time for heap malloc                       :    1060 cycles ,     8833 ns :
160        heap.free.immediate                      - Average time for heap free                         :     899 cycles ,     7491 ns :
161        ===================================================================
162        PROJECT EXECUTION SUCCESSFUL
163
164The sample output above (stack canaries are enabled) shows longer times than
165for the default build. Not only does each stack frame in the call tree have
166its own stack canary check, but enabling this feature impacts how the compiler
167chooses to inline or not inline routines.
168
169Sample output of the benchmark with object core enabled::
170
171        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     740 cycles ,     6167 ns :
172        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     740 cycles ,     6167 ns :
173        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     284 cycles ,     2374 ns :
174        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :     784 cycles ,     6541 ns :
175        thread.create.kernel.from.kernel         - Create thread                                      :     714 cycles ,     5958 ns :
176        thread.start.kernel.from.kernel          - Start thread                                       :     819 cycles ,     6833 ns :
177        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     704 cycles ,     5874 ns :
178        thread.resume.kernel.from.kernel         - Resume thread                                      :     761 cycles ,     6349 ns :
179        thread.abort.kernel.from.kernel          - Abort thread                                       :     544 cycles ,     4541 ns :
180        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     211 cycles ,     1766 ns :
181        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     132 cycles ,     1108 ns :
182        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :     850 cycles ,     7091 ns :
183        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :     565 cycles ,     4708 ns :
184        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :     947 cycles ,     7899 ns :
185        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :    1015 cycles ,     8458 ns :
186        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :     950 cycles ,     7923 ns :
187        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :    1010 cycles ,     8416 ns :
188        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     226 cycles ,     1891 ns :
189        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     123 cycles ,     1033 ns :
190        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :     848 cycles ,     7066 ns :
191        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :     565 cycles ,     4708 ns :
192        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :     951 cycles ,     7932 ns :
193        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :    1010 cycles ,     8416 ns :
194        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :     959 cycles ,     7991 ns :
195        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1010 cycles ,     8422 ns :
196        events.post.immediate.kernel             - Post events (nothing wakes)                        :     210 cycles ,     1750 ns :
197        events.set.immediate.kernel              - Set events (nothing wakes)                         :     230 cycles ,     1917 ns :
198        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     120 cycles ,     1000 ns :
199        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     150 cycles ,     1250 ns :
200        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :     951 cycles ,     7932 ns :
201        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :    1179 cycles ,     9833 ns :
202        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :     976 cycles ,     8133 ns :
203        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :    1190 cycles ,     9922 ns :
204        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :      59 cycles ,      492 ns :
205        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :      69 cycles ,      575 ns :
206        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :     870 cycles ,     7250 ns :
207        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :     929 cycles ,     7749 ns :
208        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :    1010 cycles ,     8417 ns :
209        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :    1060 cycles ,     8833 ns :
210        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :      90 cycles ,      758 ns :
211        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :      86 cycles ,      724 ns :
212        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :     910 cycles ,     7589 ns :
213        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :     975 cycles ,     8125 ns :
214        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :     105 cycles ,      875 ns :
215        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      44 cycles ,      367 ns :
216        heap.malloc.immediate                    - Average time for heap malloc                       :     621 cycles ,     5183 ns :
217        heap.free.immediate                      - Average time for heap free                         :     422 cycles ,     3516 ns :
218        ===================================================================
219        PROJECT EXECUTION SUCCESSFUL
220
221The sample output above (object core and statistics enabled) shows longer
222times than for the default build when context switching is involved. A blanket
223enabling of the object cores as was done here results in the additional
224gathering of thread statistics when a thread is switched in/out. The
225gathering of these statistics can be controlled at both at the time of
226project configuration as well as at runtime.
227
228Sample output of the benchmark with userspace enabled::
229
230        thread.yield.preemptive.ctx.k_to_k       - Context switch via k_yield                         :     975 cycles ,     8125 ns :
231        thread.yield.preemptive.ctx.u_to_u       - Context switch via k_yield                         :    1303 cycles ,    10860 ns :
232        thread.yield.preemptive.ctx.k_to_u       - Context switch via k_yield                         :    1180 cycles ,     9834 ns :
233        thread.yield.preemptive.ctx.u_to_k       - Context switch via k_yield                         :    1097 cycles ,     9144 ns :
234        thread.yield.cooperative.ctx.k_to_k      - Context switch via k_yield                         :     975 cycles ,     8125 ns :
235        thread.yield.cooperative.ctx.u_to_u      - Context switch via k_yield                         :    1302 cycles ,    10854 ns :
236        thread.yield.cooperative.ctx.k_to_u      - Context switch via k_yield                         :    1180 cycles ,     9834 ns :
237        thread.yield.cooperative.ctx.u_to_k      - Context switch via k_yield                         :    1097 cycles ,     9144 ns :
238        isr.resume.interrupted.thread.kernel     - Return from ISR to interrupted thread              :     329 cycles ,     2749 ns :
239        isr.resume.different.thread.kernel       - Return from ISR to another thread                  :    1014 cycles ,     8457 ns :
240        isr.resume.different.thread.user         - Return from ISR to another thread                  :    1223 cycles ,    10192 ns :
241        thread.create.kernel.from.kernel         - Create thread                                      :     970 cycles ,     8089 ns :
242        thread.start.kernel.from.kernel          - Start thread                                       :    1074 cycles ,     8957 ns :
243        thread.suspend.kernel.from.kernel        - Suspend thread                                     :     949 cycles ,     7916 ns :
244        thread.resume.kernel.from.kernel         - Resume thread                                      :    1004 cycles ,     8374 ns :
245        thread.abort.kernel.from.kernel          - Abort thread                                       :    2734 cycles ,    22791 ns :
246        thread.create.user.from.kernel           - Create thread                                      :     832 cycles ,     6935 ns :
247        thread.start.user.from.kernel            - Start thread                                       :    9023 cycles ,    75192 ns :
248        thread.suspend.user.from.kernel          - Suspend thread                                     :    1312 cycles ,    10935 ns :
249        thread.resume.user.from.kernel           - Resume thread                                      :    1187 cycles ,     9894 ns :
250        thread.abort.user.from.kernel            - Abort thread                                       :    2597 cycles ,    21644 ns :
251        thread.create.user.from.user             - Create thread                                      :    2144 cycles ,    17872 ns :
252        thread.start.user.from.user              - Start thread                                       :    9399 cycles ,    78330 ns :
253        thread.suspend.user.from.user            - Suspend thread                                     :    1504 cycles ,    12539 ns :
254        thread.resume.user.from.user             - Resume thread                                      :    1574 cycles ,    13122 ns :
255        thread.abort.user.from.user              - Abort thread                                       :    3237 cycles ,    26981 ns :
256        thread.start.kernel.from.user            - Start thread                                       :    1452 cycles ,    12102 ns :
257        thread.suspend.kernel.from.user          - Suspend thread                                     :    1143 cycles ,     9525 ns :
258        thread.resume.kernel.from.user           - Resume thread                                      :    1392 cycles ,    11602 ns :
259        thread.abort.kernel.from.user            - Abort thread                                       :    3372 cycles ,    28102 ns :
260        fifo.put.immediate.kernel                - Add data to FIFO (no ctx switch)                   :     239 cycles ,     1999 ns :
261        fifo.get.immediate.kernel                - Get data from FIFO (no ctx switch)                 :     184 cycles ,     1541 ns :
262        fifo.put.alloc.immediate.kernel          - Allocate to add data to FIFO (no ctx switch)       :     920 cycles ,     7666 ns :
263        fifo.get.free.immediate.kernel           - Free when getting data from FIFO (no ctx switch)   :     650 cycles ,     5416 ns :
264        fifo.put.alloc.immediate.user            - Allocate to add data to FIFO (no ctx switch)       :    1710 cycles ,    14256 ns :
265        fifo.get.free.immediate.user             - Free when getting data from FIFO (no ctx switch)   :    1440 cycles ,    12000 ns :
266        fifo.get.blocking.k_to_k                 - Get data from FIFO (w/ ctx switch)                 :    1209 cycles ,    10082 ns :
267        fifo.put.wake+ctx.k_to_k                 - Add data to FIFO (w/ ctx switch)                   :    1230 cycles ,    10250 ns :
268        fifo.get.free.blocking.k_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :    1210 cycles ,    10083 ns :
269        fifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :    1260 cycles ,    10500 ns :
270        fifo.get.free.blocking.u_to_k            - Free when getting data from FIFO (w/ ctx siwtch)   :    1745 cycles ,    14547 ns :
271        fifo.put.alloc.wake+ctx.k_to_u           - Allocate to add data to FIFO (w/ ctx switch)       :    1600 cycles ,    13333 ns :
272        fifo.get.free.blocking.k_to_u            - Free when getting data from FIFO (w/ ctx siwtch)   :    1550 cycles ,    12922 ns :
273        fifo.put.alloc.wake+ctx.u_to_k           - Allocate to add data to FIFO (w/ ctx switch)       :    1795 cycles ,    14958 ns :
274        fifo.get.free.blocking.u_to_u            - Free when getting data from FIFO (w/ ctx siwtch)   :    2084 cycles ,    17374 ns :
275        fifo.put.alloc.wake+ctx.u_to_u           - Allocate to add data to FIFO (w/ ctx switch)       :    2135 cycles ,    17791 ns :
276        lifo.put.immediate.kernel                - Add data to LIFO (no ctx switch)                   :     234 cycles ,     1957 ns :
277        lifo.get.immediate.kernel                - Get data from LIFO (no ctx switch)                 :     189 cycles ,     1582 ns :
278        lifo.put.alloc.immediate.kernel          - Allocate to add data to LIFO (no ctx switch)       :     935 cycles ,     7791 ns :
279        lifo.get.free.immediate.kernel           - Free when getting data from LIFO (no ctx switch)   :     650 cycles ,     5416 ns :
280        lifo.put.alloc.immediate.user            - Allocate to add data to LIFO (no ctx switch)       :    1715 cycles ,    14291 ns :
281        lifo.get.free.immediate.user             - Free when getting data from LIFO (no ctx switch)   :    1445 cycles ,    12041 ns :
282        lifo.get.blocking.k_to_k                 - Get data from LIFO (w/ ctx switch)                 :    1219 cycles ,    10166 ns :
283        lifo.put.wake+ctx.k_to_k                 - Add data to LIFO (w/ ctx switch)                   :    1230 cycles ,    10250 ns :
284        lifo.get.free.blocking.k_to_k            - Free when getting data from LIFO (w/ ctx switch)   :    1210 cycles ,    10083 ns :
285        lifo.put.alloc.wake+ctx.k_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1260 cycles ,    10500 ns :
286        lifo.get.free.blocking.u_to_k            - Free when getting data from LIFO (w/ ctx switch)   :    1744 cycles ,    14541 ns :
287        lifo.put.alloc.wake+ctx.k_to_u           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1595 cycles ,    13291 ns :
288        lifo.get.free.blocking.k_to_u            - Free when getting data from LIFO (w/ ctx switch)   :    1544 cycles ,    12874 ns :
289        lifo.put.alloc.wake+ctx.u_to_k           - Allocate to add data to LIFO (w/ ctx siwtch)       :    1795 cycles ,    14958 ns :
290        lifo.get.free.blocking.u_to_u            - Free when getting data from LIFO (w/ ctx switch)   :    2080 cycles ,    17339 ns :
291        lifo.put.alloc.wake+ctx.u_to_u           - Allocate to add data to LIFO (w/ ctx siwtch)       :    2130 cycles ,    17750 ns :
292        events.post.immediate.kernel             - Post events (nothing wakes)                        :     285 cycles ,     2375 ns :
293        events.set.immediate.kernel              - Set events (nothing wakes)                         :     290 cycles ,     2417 ns :
294        events.wait.immediate.kernel             - Wait for any events (no ctx switch)                :     235 cycles ,     1958 ns :
295        events.wait_all.immediate.kernel         - Wait for all events (no ctx switch)                :     245 cycles ,     2042 ns :
296        events.post.immediate.user               - Post events (nothing wakes)                        :     800 cycles ,     6669 ns :
297        events.set.immediate.user                - Set events (nothing wakes)                         :     811 cycles ,     6759 ns :
298        events.wait.immediate.user               - Wait for any events (no ctx switch)                :     780 cycles ,     6502 ns :
299        events.wait_all.immediate.user           - Wait for all events (no ctx switch)                :     770 cycles ,     6419 ns :
300        events.wait.blocking.k_to_k              - Wait for any events (w/ ctx switch)                :    1210 cycles ,    10089 ns :
301        events.set.wake+ctx.k_to_k               - Set events (w/ ctx switch)                         :    1449 cycles ,    12082 ns :
302        events.wait_all.blocking.k_to_k          - Wait for all events (w/ ctx switch)                :    1250 cycles ,    10416 ns :
303        events.post.wake+ctx.k_to_k              - Post events (w/ ctx switch)                        :    1475 cycles ,    12291 ns :
304        events.wait.blocking.u_to_k              - Wait for any events (w/ ctx switch)                :    1612 cycles ,    13435 ns :
305        events.set.wake+ctx.k_to_u               - Set events (w/ ctx switch)                         :    1627 cycles ,    13560 ns :
306        events.wait_all.blocking.u_to_k          - Wait for all events (w/ ctx switch)                :    1785 cycles ,    14875 ns :
307        events.post.wake+ctx.k_to_u              - Post events (w/ ctx switch)                        :    1790 cycles ,    14923 ns :
308        events.wait.blocking.k_to_u              - Wait for any events (w/ ctx switch)                :    1407 cycles ,    11727 ns :
309        events.set.wake+ctx.u_to_k               - Set events (w/ ctx switch)                         :    1828 cycles ,    15234 ns :
310        events.wait_all.blocking.k_to_u          - Wait for all events (w/ ctx switch)                :    1585 cycles ,    13208 ns :
311        events.post.wake+ctx.u_to_k              - Post events (w/ ctx switch)                        :    2000 cycles ,    16666 ns :
312        events.wait.blocking.u_to_u              - Wait for any events (w/ ctx switch)                :    1810 cycles ,    15087 ns :
313        events.set.wake+ctx.u_to_u               - Set events (w/ ctx switch)                         :    2004 cycles ,    16705 ns :
314        events.wait_all.blocking.u_to_u          - Wait for all events (w/ ctx switch)                :    2120 cycles ,    17666 ns :
315        events.post.wake+ctx.u_to_u              - Post events (w/ ctx switch)                        :    2315 cycles ,    19291 ns :
316        semaphore.give.immediate.kernel          - Give a semaphore (no waiters)                      :     125 cycles ,     1042 ns :
317        semaphore.take.immediate.kernel          - Take a semaphore (no blocking)                     :     125 cycles ,     1042 ns :
318        semaphore.give.immediate.user            - Give a semaphore (no waiters)                      :     645 cycles ,     5377 ns :
319        semaphore.take.immediate.user            - Take a semaphore (no blocking)                     :     680 cycles ,     5669 ns :
320        semaphore.take.blocking.k_to_k           - Take a semaphore (context switch)                  :    1140 cycles ,     9500 ns :
321        semaphore.give.wake+ctx.k_to_k           - Give a semaphore (context switch)                  :    1174 cycles ,     9791 ns :
322        semaphore.take.blocking.k_to_u           - Take a semaphore (context switch)                  :    1350 cycles ,    11251 ns :
323        semaphore.give.wake+ctx.u_to_k           - Give a semaphore (context switch)                  :    1542 cycles ,    12852 ns :
324        semaphore.take.blocking.u_to_k           - Take a semaphore (context switch)                  :    1512 cycles ,    12603 ns :
325        semaphore.give.wake+ctx.k_to_u           - Give a semaphore (context switch)                  :    1382 cycles ,    11519 ns :
326        semaphore.take.blocking.u_to_u           - Take a semaphore (context switch)                  :    1723 cycles ,    14360 ns :
327        semaphore.give.wake+ctx.u_to_u           - Give a semaphore (context switch)                  :    1749 cycles ,    14580 ns :
328        condvar.wait.blocking.k_to_k             - Wait for a condvar (context switch)                :    1285 cycles ,    10708 ns :
329        condvar.signal.wake+ctx.k_to_k           - Signal a condvar (context switch)                  :    1315 cycles ,    10964 ns :
330        condvar.wait.blocking.k_to_u             - Wait for a condvar (context switch)                :    1547 cycles ,    12898 ns :
331        condvar.signal.wake+ctx.u_to_k           - Signal a condvar (context switch)                  :    1855 cycles ,    15458 ns :
332        condvar.wait.blocking.u_to_k             - Wait for a condvar (context switch)                :    1990 cycles ,    16583 ns :
333        condvar.signal.wake+ctx.k_to_u           - Signal a condvar (context switch)                  :    1640 cycles ,    13666 ns :
334        condvar.wait.blocking.u_to_u             - Wait for a condvar (context switch)                :    2313 cycles ,    19280 ns :
335        condvar.signal.wake+ctx.u_to_u           - Signal a condvar (context switch)                  :    2170 cycles ,    18083 ns :
336        stack.push.immediate.kernel              - Add data to k_stack (no ctx switch)                :     189 cycles ,     1582 ns :
337        stack.pop.immediate.kernel               - Get data from k_stack (no ctx switch)              :     194 cycles ,     1624 ns :
338        stack.push.immediate.user                - Add data to k_stack (no ctx switch)                :     679 cycles ,     5664 ns :
339        stack.pop.immediate.user                 - Get data from k_stack (no ctx switch)              :    1014 cycles ,     8455 ns :
340        stack.pop.blocking.k_to_k                - Get data from k_stack (w/ ctx switch)              :    1209 cycles ,    10083 ns :
341        stack.push.wake+ctx.k_to_k               - Add data to k_stack (w/ ctx switch)                :    1235 cycles ,    10291 ns :
342        stack.pop.blocking.u_to_k                - Get data from k_stack (w/ ctx switch)              :    2050 cycles ,    17089 ns :
343        stack.push.wake+ctx.k_to_u               - Add data to k_stack (w/ ctx switch)                :    1575 cycles ,    13125 ns :
344        stack.pop.blocking.k_to_u                - Get data from k_stack (w/ ctx switch)              :    1549 cycles ,    12916 ns :
345        stack.push.wake+ctx.u_to_k               - Add data to k_stack (w/ ctx switch)                :    1755 cycles ,    14625 ns :
346        stack.pop.blocking.u_to_u                - Get data from k_stack (w/ ctx switch)              :    2389 cycles ,    19916 ns :
347        stack.push.wake+ctx.u_to_u               - Add data to k_stack (w/ ctx switch)                :    2095 cycles ,    17458 ns :
348        mutex.lock.immediate.recursive.kernel    - Lock a mutex                                       :     165 cycles ,     1375 ns :
349        mutex.unlock.immediate.recursive.kernel  - Unlock a mutex                                     :      80 cycles ,      668 ns :
350        mutex.lock.immediate.recursive.user      - Lock a mutex                                       :     685 cycles ,     5711 ns :
351        mutex.unlock.immediate.recursive.user    - Unlock a mutex                                     :     615 cycles ,     5128 ns :
352        heap.malloc.immediate                    - Average time for heap malloc                       :     626 cycles ,     5224 ns :
353        heap.free.immediate                      - Average time for heap free                         :     427 cycles ,     3558 ns :
354        ===================================================================
355        PROJECT EXECUTION SUCCESSFUL
356
357The sample output above (userspace enabled) shows longer times than for
358the default build scenario. Enabling userspace results in additional
359runtime overhead on each call to a kernel object to determine whether the
360caller is in user or kernel space and consequently whether a system call
361is needed or not.
362