1.. _safety_overview: 2 3Zephyr Safety Overview 4######################## 5 6Introduction 7************ 8 9This document is the safety documentation providing an overview over the safety-relevant activities 10and what the Zephyr Project and the Zephyr Safety Working Group / Committee try to achieve. 11 12This overview is provided for people who are interested in the functional safety development part 13of the Zephyr RTOS and project members who want to contribute to the safety aspects of the 14project. 15 16Overview 17******** 18 19In this section we give the reader an overview of what the general goal of the safety certification 20is, what standard we aim to achieve and what quality standards and processes need to be implemented 21to reach such a safety certification. 22 23Safety Document update 24********************** 25 26This document is a living document and may evolve over time as new requirements, guidelines, or 27processes are introduced. 28 29#. Changes will be submitted from the interested party(ies) via pull requests to the Zephyr 30 documentation repository. 31 32#. The Zephyr Safety Committee will review these changes and provide feedback or acceptance of 33 the changes. 34 35#. Once accepted, these changes will become part of the document. 36 37General safety scope 38******************** 39 40The general scope of the Safety Committee is to achieve a certification for the `IEC 61508 41<https://en.wikipedia.org/wiki/IEC_61508>`__ standard and the Safety Integrity Level (SIL) 3 / 42Systematic Capability (SC) 3 for a limited source scope (see certification scope TBD). Since the 43code base is pre-existing, we use the route 3s/1s approach defined by the IEC 61508 standard. 44 45Route 3s 46 *Assessment of non-compliant development. Which is basically the route 1s with existing 47 sources.* 48 49Route 1s 50 *Compliant development. Compliance with the requirements of this standard for the avoidance and 51 control of systematic faults in software.* 52 53Summarization IEC 61508 standard 54================================ 55 56The IEC 61508 standard is a widely recognized international standard for functional safety of 57electrical, electronic, and programmable electronic safety-related systems. Here's an overview of 58some of the key safety aspects of the standard: 59 60#. **Hazard and Risk Analysis**: The IEC 61508 standard requires a thorough analysis of potential 61 hazards and risks associated with a system in order to determine the appropriate level of safety 62 measures needed to reduce those risks to acceptable levels. 63 64#. **Safety Integrity Level (SIL)**: The standard introduces the concept of Safety Integrity Level 65 (SIL) to classify the level of risk reduction required for each safety function. The higher the 66 SIL, the greater the level of risk reduction required. 67 68#. **System Design**: The IEC 61508 standard requires a systematic approach to system design that 69 includes the identification of safety requirements, the development of a safety plan, and the 70 use of appropriate safety techniques and measures to ensure that the system meets the required 71 SIL. 72 73#. **Verification and Validation**: The standard requires rigorous testing and evaluation of the 74 safety-related system to ensure that it meets the specified SIL and other safety requirements. 75 This includes verification of the system design, validation of the system's functionality, and 76 ongoing monitoring and maintenance of the system. 77 78#. **Documentation and Traceability**: The IEC 61508 standard requires a comprehensive 79 documentation process to ensure that all aspects of the safety-related system are fully 80 documented and that there is full traceability from the safety requirements to the final system 81 design and implementation. 82 83Overall, the IEC 61508 standard provides a framework for the design, development, and 84implementation of safety-related systems that aims to reduce the risk of accidents and improve 85overall safety. By following the standard, organizations can ensure that their safety-related 86systems are designed and implemented to the highest level of safety integrity. 87 88Why IEC 61508? 89============== 90The IEC 61508 standard was selected because it serves as a foundational functional safety standard 91applicable across various industry sectors. It provides a robust framework that can be used as 92base for specific standards for different industries. This makes IEC 61508 particularly relevant 93for Zephyr, as the operating system's versatility allows it to be effectively utilized across a 94wide range of industry sectors. 95 96The following diagram illustrates the relationship between the IEC 61508 standard and other related 97standards: 98 99.. figure:: images/IEC-61508-basis.svg 100 :align: center 101 :alt: IEC 61508 relation to other standards 102 :figclass: align-center 103 104 IEC 61508 relation to other standards 105 106Quality 107******* 108 109Quality is a mandatory expectation for software across the industry. The code base of the project 110must achieve various software quality goals in order to be considered an auditable code base from a 111safety perspective and to be usable for certification purposes. But software quality is not an 112additional requirement caused by functional safety standards. Functional safety considers quality 113as an existing pre-condition and therefore the "quality managed" status should be pursued for any 114project regardless of the functional safety goals. The following list describes the quality goals 115which need to be reached to achieve an auditable code base: 116 1171. Basic software quality standards 118 119 a. :ref:`coding_guidelines` (including: static code analysis, coding style, etc.) 120 b. :ref:`safety_requirements` and requirements tracing 121 c. Test coverage 122 1232. Software architecture design principles 124 125 a. Layered architecture model 126 b. Encapsulated components 127 c. Encapsulated single functionality (if not fitable and manageable in safety) 128 129Basic software quality standards - Safety view 130============================================== 131 132In this chapter the Safety Committee describes why they need the above listed quality goals as 133pre-condition and what needs to be done to achieve an auditable code base from the safety 134perspective. Generally speaking, it can be said that all of these quality measures regarding safety 135are used to minimize the error rate during code development. 136 137Coding Guidelines 138----------------- 139 140The coding guidelines are the basis to a common understanding and a unified ruleset and development 141style for industrial software products. For safety the coding guidelines are essential and have 142another purpose beside the fact of a unified ruleset. It is also necessary to prove that the 143developers follow a unified development style to prevent **systematic errors** in the process of 144developing software and thus to minimize the overall **error rate** of the complete software 145system. 146 147Also the **IEC 61508 standard** sets a pre-condition and recommendation towards the use of coding 148standards / guidelines to reduce likelihood of errors. 149 150Requirements and requirements tracing 151------------------------------------- 152 153Requirements and requirement management are not only important for software development, but also 154very important in terms of safety. On the one hand, this specifies and describes in detail and on a 155technical level what the software should do, and on the other hand, it is an important and 156necessary tool to verify whether the described functionality is implemented as expected. For this 157purpose, tracing the requirements down to the code level is used. With the requirements management 158and tracing in hand, it can now be verified whether the functionality has been tested and 159implemented correctly, thus minimizing the systematic error rate. 160 161Also the IEC 61508 standard highly recommends (which is like a must-have for the certification) 162requirements and requirements tracing. 163 164Test coverage 165------------- 166 167A high test coverage, in turn, is evidence of safety that the code conforms precisely to what it 168was developed for and does not execute any unforeseen instructions. If the entire code is tested 169and has a high (ideally 100%) test coverage, it has the additional advantage of quickly detecting 170faulty changes and further minimizing the error rate. However, it must be noted that different 171requirements apply to safety for test coverage, and various metrics must be considered, which are 172prescribed by the IEC 61508 standard for the SIL 3 / SC3 target. The following must be fulfilled, 173among other things: 174 175* Structural test coverage (entry points) 100% 176* Structural test coverage (statements) 100% 177* Structural test coverage (branches) 100% 178 179If the 100% cannot be reached (e.g. statement coverage of defensive code) that part needs to be 180described and justified in the documentation. 181 182Software architecture design principles 183======================================= 184 185To create and maintain a structured software product it is also necessary to consider individual 186software architecture designs and implement them in accordance with safety standards because some 187designs and implementations are not reasonable in safety, so that the overall software and code 188base can be used as auditable code. However, most of these software architecture designs have 189already been implemented in the Zephyr project and need to be verified by the Safety Committee / 190Safety Working Group and the safety architect. 191 192Layered architecture model 193-------------------------- 194 195The **IEC 61508 standard** strongly recommends a modular approach to software architecture. This 196approach has been pursued in the Zephyr project from the beginning with its layered architecture. 197The idea behind this architecture is to organize modules or components with similar functionality 198into layers. As a result, each layer can be assigned a specific role in the system. This model has 199the advantage in safety that interfaces between different components and layers can be shown at a 200very high level, and thus it can be determined which functionalities are safety-relevant and can be 201limited. Furthermore, various analyses and documentations can be built on top of this architecture, 202which are important for certification and the responsible certification body. 203 204Encapsulated components 205----------------------- 206 207Encapsulated components are an essential part of the architecture design for safety at this point. 208The most important aspect is the separation of safety-relevant components from non-safety-relevant 209components, including their associated interfaces. This ensures that the components have no 210**repercussions** on other components. 211 212Encapsulated single functionality (if not reasonable and manageable in safety) 213------------------------------------------------------------------------------ 214 215Another requirement for the overall system and software environment is that individual 216functionalities can be disabled within components. This is because if a function is absolutely 217unacceptable for safety (e.g. complete dynamic memory management), then these individual 218functionalities should be able to be turned off. The Zephyr Project already offers such a 219possibility through the use of Kconfig and its flexible configurability. 220 221Processes and workflow 222********************** 223 224.. figure:: images/zephyr-safety-process.svg 225 :align: center 226 :alt: Safety process and workflow overview 227 :figclass: align-center 228 229 Safety process and workflow overview 230 231The diagram describes the rough process defined by the Safety Committee to ensure safety in the 232development of the Zephyr project. To ensure understanding, a few points need to be highlighted and 233some details explained regarding the role of the safety architect and the role of the safety 234committee in the whole process. The diagram only describes the paths that are possible when a 235change is related to safety. 236 237#. On the main branch, the safety scope of the project should be identified, which typically 238 represents a small subset of the entire code base. This subset should then be made auditable 239 during normal development on “main”, which means that special attention is paid to quality goals 240 (`Quality`_) and safety processes within this scope. The Safety Architect works alongside the 241 Technical Steering Committee (TSC) in this area, monitoring the development process to ensure 242 that the architecture meets the safety requirements. 243 244#. At this point, the safety architect plays an increasingly important role. For PRs/issues that 245 fall within the safety scope, the safety architect should ideally be involved in the discussions 246 and decisions of minor changes in the safety scope to be able to react to safety-relevant 247 changes that are not conformant. If a pull request or issue introduces a significant and 248 influential change or improvement that requires extended discussion or decision-making, the 249 safety architect should bring it to the attention of the Safety Committee or the Technical 250 Steering Committee (TSC) as appropriate, so that they can make a decision on the best course of 251 action. 252 253#. This section describes the certification side. At this point, the code base has to be in an 254 "auditable" state, and ideally no further changes should be necessary or made to the code base. 255 There is still a path from the main branch to this area. This is needed in case a serious bug or 256 important change is found or implemented on the main branch in the safety scope, after the LTS 257 and the auditable branch were created. In this case, the Safety Committee, together with the 258 safety architect, must decide whether this bug fix or change should be integrated into the LTS 259 so that the bug fix or change could also be integrated into the auditable branch. This 260 integration can take three forms: First either as only a code change or second as only an update 261 to the safety documentation or third as both. 262 263#. This describes the necessary safety process required for certification itself. Here, the final 264 analyses, tests, and documents are created and conducted which must be created and conducted 265 during the certification, and which are prescribed by the certifying authority and the standard 266 being certified. If the certification body approves everything at this stage and the safety 267 process is completed, a safety release can be created and published. 268 269#. This transition from the auditable branch to the main branch should only occur in exceptional 270 circumstances, specifically when something has been identified during the certification process 271 that needs to be quickly adapted on the “auditable” branch in order to obtain certification. In 272 order to prevent this issue from arising again during the next certification, there needs to be 273 a path to merge these changes back into the main branch so that they are not lost, and to have 274 them ready for the next certification if necessary. 275 276.. important:: 277 Safety should not block the project and minimize the room to grow in any way. 278 279.. important:: 280 **TODO:** Find and define ways, guidelines and processes which minimally impact the daily work 281 of the maintainers, reviewers and contributors and also the safety architect itself. 282 But which are also suitable for safety. 283