1.. _security-overview: 2 3Zephyr Security Overview 4######################## 5 6Introduction 7************ 8 9This document outlines the steps of the Zephyr Security Subcommittee towards a 10defined security process that helps developers build more secure 11software while addressing security compliance requirements. It presents 12the key ideas of the security process and outlines which documents need 13to be created. After the process is implemented and all supporting 14documents are created, this document is a top-level overview and entry 15point. 16 17Overview and Scope 18================== 19 20We begin with an overview of the Zephyr development process, which 21mainly focuses on security functionality. 22 23In subsequent sections, the individual parts of the process are treated 24in detail. As depicted in Figure 1, these main steps are: 25 261. **Secure Development:** Defines the system architecture and 27 development process that ensures adherence to relevant coding 28 principles and quality assurance procedures. 29 302. **Secure Design:** Defines security procedures and implement measures 31 to enforce them. A security architecture of the system and 32 relevant sub-modules is created, threats are identified, and 33 countermeasures designed. Their correct implementation and the 34 validity of the threat models are checked by code reviews. 35 Finally, a process shall be defined for reporting, classifying, 36 and mitigating security issues. 37 383. **Security Certification:** Defines the certifiable part of the 39 Zephyr RTOS. This includes an evaluation target, its assets, and 40 how these assets are protected. Certification claims shall be 41 determined and backed with appropriate evidence. 42 43.. figure:: media/security-process-steps.png 44 45 Figure 1. Security Process Steps 46 47Intended Audience 48================= 49 50This document is a guideline for the development of a security process 51by the Zephyr Security Subcommittee and the Zephyr Technical Steering 52Committee. It provides an overview of the Zephyr security process for 53(security) engineers and architects. 54 55Nomenclature 56============ 57 58In this document, the keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", 59"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 60"OPTIONAL" are to be interpreted as described in [RFC2119]_. 61 62These words are used to define absolute requirements (or prohibitions), 63highly recommended requirements, and truly optional requirements. As 64noted in RFC-2119, "These terms are frequently used to specify behavior 65with security implications. The effects on security of not implementing 66a MUST or SHOULD, or doing something the specification says MUST NOT or 67SHOULD NOT be done may be very subtle. Document authors should take the 68time to elaborate the security implications of not following 69recommendations or requirements as most implementors will not have had 70the benefit of the experience and discussion that produced the 71specification." 72 73Security Document Update 74======================== 75 76This document is a living document. As new requirements, features, and 77changes are identified, they will be added to this document through the 78following process: 79 801. Changes will be submitted from the interested party(ies) via pull 81 requests to the Zephyr documentation repository. 82 832. The Zephyr Security Subcommittee will review these changes and provide feedback 84 or acceptance of the changes. 85 863. Once accepted, these changes will become part of the document. 87 88Current Security Definition 89*************************** 90 91This section recapitulates the current status of secure development 92within the Zephyr RTOS. Currently, focus is put on functional security 93and code quality assurance, although additional security features are 94scoped. 95 96The three major security measures currently implemented are: 97 98- **Security** **Functionality** with a focus on cryptographic 99 algorithms and protocols. Support for cryptographic hardware is 100 scoped for future releases. The Zephyr runtime architecture is a 101 monolithic binary and removes the need for dynamic loaders, 102 thereby reducing the exposed attack surface. 103 104- **Quality Assurance** is driven by using a development process that 105 requires all code to be reviewed before being committed to the 106 common repository. Furthermore, the reuse of proven building 107 blocks such as network stacks increases the overall quality level 108 and guarantees stable APIs. Static code analyses provide additional 109 quality checks. 110 111- **Execution Protection** including thread separation, stack and 112 memory protection is currently available in the upstream 113 Zephyr RTOS starting with version 1.9.0 (stack protection). Memory 114 protection and thread separation were added in version 1.10.0 for X86 115 and in version 1.11.0 for ARM and ARC. 116 117These topics are discussed in more detail in the following subsections. 118 119Security Functionality 120====================== 121 122The security functionality in Zephyr hinges mainly on the inclusion of 123cryptographic algorithms, and on its monolithic system design. 124 125The cryptographic features are provided through PSA Crypto, with 126mbedTLS as the underlying implementation. Applications leverage PSA 127Crypto APIs, ensuring a standardized and secure approach to 128cryptographic operations. mbedTLS, as the implementation of PSA 129Crypto, supports a wide range of cryptographic algorithms, making it 130suitable for various application requirements. 131 132APIs for vendor specific cryptographic IPs in both hardware and software 133are planned, including secure key storage in the form of secure access 134modules (SAMs), Trusted Platform Modules (TPMs), and 135Trusted Execution Environments (TEEs). 136 137The security architecture is based on a monolithic design where the 138Zephyr kernel and all applications are compiled into a single static 139binary. System calls are implemented as function calls without requiring 140context switches. Static linking eliminates the potential for 141dynamically loading malicious code. 142 143Additional protection features are available in later releases. Stack 144protection mechanisms are provided to protect against stack overruns. 145In addition, applications can take advantage of thread separation 146features to split the system into privileged and unprivileged execution 147environments. Memory protection features provide the capability to 148partition system resources (memory, peripheral address space, etc.) and 149assign resources to individual threads or groups of threads. Stack, 150thread execution level, and memory protection constraints are enforced 151at the time of context switch. 152 153Quality Assurance 154================= 155 156The Zephyr project uses an automated quality assurance process. The goal 157is to have a process including mandatory code reviews, feature and issue 158management/tracking, and static code analyses. 159 160Code reviews are documented and enforced using a voting system before 161getting checked into the repository by the responsible subsystem's 162maintainer. The main goals of the code review are: 163 164- Verifying correct functionality of the implementation 165 166- Increasing the readability and maintainability of the contributed 167 source code 168 169- Ensuring appropriate usage of string and memory functions 170 171- Validation of the user input 172 173- Reviewing the security relevant code for potential issues 174 175The current coding principles focus mostly on coding styles and 176conventions. Functional correctness is ensured by the build system and 177the experience of the reviewer. Especially for security relevant code, 178concrete and detailed guidelines need to be developed and aligned with 179the developers (see: :ref:`secure code`). 180 181Static code analyses are run on the Zephyr code tree on a regular basis, 182see :ref:`static_analysis`. 183 184Bug and issue tracking and management is performed using Github. The term 185"survivability" was coined to cover pro-active security tasks such as 186security issue categorization and management. A problem identified as 187vulnerability is managed within Github security advisories. 188 189Issues determined by static analyses should have more stringent reviews before 190they are closed as non-issues (at least another person educated in 191security processes need to agree on non-issue before closing). 192 193A security subcommittee has been formed to develop a security process in 194more detail; this document is part of that process. 195 196Execution Protection 197==================== 198 199Execution protection is supported and can be categorized into the 200following tasks: 201 202- **Memory separation:** Memory will be partitioned into regions and 203 assigned attributes based on the owner of that region of memory. 204 Threads will only have access to regions they control. 205 206- **Stack protection:** Stack guards would provide mechanisms for 207 detecting and trapping stack overruns. Individual threads should 208 only have access to their own stacks. 209 210- **Thread separation:** Individual threads should only have access to 211 their own memory resources. As threads are scheduled, only memory 212 resources owned by that thread will be accessible. Topics such as 213 program flow protection and other measures for tamper resistance 214 are currently not in scope. 215 216System Level Security (Ecosystem, ...) 217====================================== 218 219System level security encompasses a wide variety of categories. Some 220examples of these would be: 221 222- Secure/trusted boot 223- Over the air (OTA) updates 224- External Communication 225- Device authentication 226- Access control of onboard resources 227 228 - Flash updating 229 - Secure storage 230 - Peripherals 231 232- Root of trust 233- Reduction of attack surface 234 235Some of these categories are interconnected and rely on multiple pieces 236to be in place to produce a full solution for the application. 237 238Secure Development Process 239************************** 240 241The development of secure code shall adhere to certain criteria. These 242include coding guidelines and development processes that can be roughly 243separated into two categories related to software quality and related to 244software security. Furthermore, a system architecture document shall be 245created and kept up-to-date with future development. 246 247System Architecture 248=================== 249 250.. figure:: media/security-zephyr-system-architecture.png 251 252 Figure 2: Zephyr System Architecture 253 254A high-level schematic of the Zephyr system architecture is given in 255Figure 2. It separates the architecture into an OS part (*kernel + OS 256Services*) and a user-specific part (*Application Services*). The OS 257part itself contains low-level, platform specific drivers and the 258generic implementation of I/O APIs, file systems, kernel-specific 259functions, and the cryptographic library. 260 261A document describing the system architecture and design choices shall 262be created and kept up to date with future development. This document 263shall include the base architecture of the Zephyr OS and an overview of 264important submodules. For each of the modules, a dedicated architecture 265document shall be created and evaluated against the implementation. 266These documents shall serve as an entry point to new developers and as a 267basis for the security architecture. Please refer to the 268:ref:`Zephyr subsystem documentation <os_services>` for 269detailed information. 270 271Secure Coding 272============= 273 274Designing an open software system such as Zephyr to be secure requires 275adhering to a defined set of design standards. These standards are 276included in the Zephyr Project documentation, specifically in its 277:ref:`secure code` section. In [SALT75]_, the following, widely 278accepted principles for protection mechanisms are defined to prevent 279security violations and limit their impact: 280 281- **Open design** as a design principle incorporates the maxim that 282 protection mechanisms cannot be kept secret on any system in 283 widespread use. Instead of relying on secret, custom-tailored 284 security measures, publicly accepted cryptographic algorithms and 285 well established cryptographic libraries shall be used. 286 287- **Economy of mechanism** specifies that the underlying design of a 288 system shall be kept as simple and small as possible. In the 289 context of the Zephyr project, this can be realized, e.g., by 290 modular code [PAUL09]_ and abstracted APIs. 291 292- **Complete mediation** requires that each access to every object and 293 process needs to be authenticated first. Mechanisms to store 294 access conditions shall be avoided if possible. 295 296- **Fail-safe defaults** defines that access is restricted by default 297 and permitted only in specific conditions defined by the system 298 protection scheme, e.g., after successful authentication. 299 Furthermore, default settings for services shall be chosen in a 300 way to provide maximum security. This corresponds to the "Secure 301 by Default" paradigm [MS12]_. 302 303- **Separation of privilege** is the principle that two conditions or 304 more need to be satisfied before access is granted. In the 305 context of the Zephyr project, this could encompass split keys 306 [PAUL09]_. 307 308- **Least privilege** describes an access model in which each user, 309 program and thread shall have the smallest possible 310 subset of permissions in the system required to perform their 311 task. This positive security model aims to minimize the attack 312 surface of the system. 313 314- **Least common mechanism** specifies that mechanisms common to more 315 than one user or process shall not be shared if not strictly 316 required. The example given in [SALT75]_ is a function that should 317 be implemented as a shared library executed by each user and not 318 as a supervisor procedure shared by all users. 319 320- **Psychological acceptability** requires that security features are 321 easy to use by the developers in order to ensure its usage and 322 the correctness of its application. 323 324In addition to these general principles, the following points are 325specific to the development of a secure RTOS: 326 327- **Complementary Security/Defense in Depth:** do not rely on a single 328 threat mitigation approach. In case of the complementary security 329 approach, parts of the threat mitigation are performed by the 330 underlying platform. In case such mechanisms are not provided by 331 the platform, or are not trusted, a defense in depth [MS12]_ 332 paradigm shall be used. 333 334- **Less commonly used services off by default**: to reduce the 335 exposure of the system to potential attacks, features or services 336 shall not be enabled by default if they are only rarely used (a 337 threshold of 80% is given in [MS12]_). For the Zephyr project, 338 this can be realized using the configuration management. Each 339 functionality and module shall be represented as a configuration 340 option and needs to be explicitly enabled. Then, all features, 341 protocols, and drivers not required for a particular use case can 342 be disabled. The user shall be notified if low-level options and 343 APIs are enabled but not used by the application. 344 345- **Change management:** to guarantee a traceability of changes to the 346 system, each change shall follow a specified process including a 347 change request, impact analysis, ratification, implementation, 348 and validation phase. In each stage, appropriate documentation 349 shall be provided. All commits shall be related to a bug report 350 or change request in the issue tracker. Commits without a valid 351 reference shall be denied. 352 353Based on these design principles and commonly accepted best practices, a 354secure development guide shall be developed, published, and implemented 355into the Zephyr development process. Further details on this are given 356in the `Secure Design`_ section. 357 358Quality Assurance 359================= 360 361The quality assurance part encompasses the following criteria: 362 363- **Adherence to the Coding Conventions** with respect to coding style, 364 naming schemes of modules, functions, variables, and so forth. 365 This increases the readability of the Zephyr code base and eases 366 the code review. These coding conventions are enforced by 367 automated scripts prior to check-in. 368 369- **Adherence to Deployment Guidelines** is required to ensure 370 consistent releases with a well-documented feature set and a 371 trackable list of security issues. 372 373- **Code Reviews** ensure the functional correctness of the code base 374 and shall be performed on each proposed code change prior to 375 check-in. Code reviews shall be performed by at least one 376 independent reviewer other than the author(s) of the code change. 377 These reviews shall be performed by the subsystem maintainers and 378 developers on a functional level and are to be distinguished from 379 security reviews as laid out in the `Secure Design`_ section. 380 Refer to the :ref:`development_model` documentation for more information. 381 382- **Static Code Analysis** tools efficiently detect common coding 383 mistakes in large code bases. All code shall be analyzed using an 384 appropriate tool prior to merges into the main repository. This 385 is not per individual commit, but is to be run on some interval 386 on specific branches. It is mandatory to remove all findings or 387 waive potential false-positives before each release. 388 Waivers shall be documented centrally and 389 in the form of a comment inside the source code itself. The 390 documentation shall include the employed tool and its version, 391 the date of the analysis, the branch and parent revision number, 392 the reason for the waiver, the author of the respective code, and 393 the approver(s) of the waiver. This shall as a minimum run on the 394 main release branch and on the security branch. It shall be 395 ensured that each release has zero issues with regard to static 396 code analysis (including waivers). 397 Refer to the :ref:`development_model` documentation for more information. 398 399 400- **Complexity Analyses** shall be performed as part of the development 401 process and metrics such as cyclomatic complexity shall be 402 evaluated. The main goal is to keep the code as simple as 403 possible. 404 405- **Automation:** the review process and checks for coding rule 406 adherence are a mandatory part of the precommit checks. To 407 ensure consistent application, they shall be automated as part of 408 the precommit procedure. Prior to merging large pieces of code 409 in from subsystems, in addition to review process and coding rule 410 adherence, all static code analysis must have been run and issues 411 resolved. 412 413Release and Lifecycle Management 414================================ 415 416Lifecycle management contains several aspects: 417 418- **Device management** encompasses the possibility to update the 419 operating system and/or security related sub-systems of Zephyr 420 enabled devices in the field. 421 422- **Lifecycle management:** system stages shall be defined and 423 documented along with the transactions between the stages in a 424 system state diagram. For security reasons, this shall include 425 locking of the device in case an attack has been detected, and a 426 termination if the end of life is reached. 427 428- **Release management** describes the process of defining the release 429 cycle, documenting releases, and maintaining a record of known 430 vulnerabilities and mitigations. Especially for certification 431 purposes the integrity of the release needs to be ensured in a 432 way that later manipulation (e.g., inserting of backdoors, etc.) 433 can be easily detected. 434 435- **Rights management and NDAs:** if required by the chosen 436 certification, the confidentiality and integrity of the system 437 needs to be ensured by an appropriate rights management (e.g., 438 separate source code repository) and non-disclosure agreements 439 between the relevant parties. In case of a repository shared 440 between several parties, measures shall be taken that no 441 malicious code is checked in. 442 443These points shall be evaluated with respect to their impact on the 444development process employed for the Zephyr project. 445 446Secure Design 447************* 448 449In order to obtain a certifiable system or product, the security process 450needs to be clearly defined and its application needs to be monitored 451and driven. This process includes the development of security related 452modules in all of its stages and the management of reported security 453issues. Furthermore, threat models need to be created for currently 454known and future attack vectors, and their impact on the system needs to 455be investigated and mitigated. Please refer to the 456:ref:`secure code` outlined in the Zephyr project documentation 457for detailed information. 458 459The software security process includes: 460 461- **Adherence to the Secure Development Coding** is mandatory to 462 avoid that individual components breach the system security and 463 to minimize the vulnerability of individual modules. While this 464 can be partially achieved by automated tests, it is inevitable to 465 investigate the correct implementation of security features such 466 as countermeasures manually in security-critical modules. 467 468- **Security Reviews** shall be performed by a security architect in 469 preparation of each security-targeted release and each time a 470 security-related module of the Zephyr project is changed. This 471 process includes the validation of the effectiveness of 472 implemented security measures, the adherence to the global 473 security strategy and architecture, and the preparation of audits 474 towards a security certification if required. 475 476- **Security Issue Management** encompasses the evaluation of potential 477 system vulnerabilities and their mitigation as described in 478 :ref:`Security Issue Management <reporting>`. 479 480These criteria and tasks need to be integrated into the development 481process for secure software and shall be automated wherever possible. On 482system level, and for each security related module of the secure branch 483of Zephyr, a directly responsible security architect shall be defined to 484guide the secure development process. 485 486Security Architecture 487===================== 488 489The general guidelines above shall be accompanied by an architectural 490security design on system- and module-level. The high level 491considerations include 492 493- The identification of **security and compliance requirements** 494 495- **Functional security** such as the use of cryptographic functions 496 whenever applicable 497 498- Design of **countermeasures** against known attack vectors 499 500- Recording of security relevant **auditable events** 501 502- Support for **Trusted Platform Modules (TPM)** and 503 **Trusted Execution Environments (TEE)** 504 505- Mechanisms to allow for **in-the-field** **updates** of devices using 506 Zephyr 507 508- Task scheduler and separation 509 510The security architecture development is based on assets derived from 511the structural overview of the overall system architecture. Based on 512this, the individual steps include: 513 5141. **Identification of assets** such as user data, authentication and 515 encryption keys, key generation data (obtained from RNG), 516 security relevant status information. 517 5182. **Identification of threats** against the assets such as breaches of 519 confidentiality, manipulation of user data, etc. 520 5213. **Definition of requirements** regarding security and protection of 522 the assets, e.g., countermeasures or memory protection schemes. 523 524The security architecture shall be harmonized with the existing system 525architecture and implementation to determine potential deviations and 526mitigate existing weaknesses. Newly developed sub-modules that are 527integrated into the secure branch of the Zephyr project shall provide 528individual documents describing their security architecture. 529Additionally, their impact on the system level security shall be 530considered and documented. 531 532Security Vulnerability Reporting 533================================ 534 535Please see :ref:`reporting` for information on reporting security 536vulnerabilities. 537 538Threat Modeling and Mitigation 539============================== 540 541The modeling of security threats against the Zephyr RTOS is required for 542the development of an accurate security architecture and for most 543certification schemes. The first step of this process is the definition 544of assets to be protected by the system. The next step then models how 545these assets are protected by the system and which threats against them 546are present. After a threat has been identified, a corresponding threat 547model is created. This model contains the asset and system 548vulnerabilities, as well as the description of the potential exploits of 549these vulnerabilities. Additionally, the impact on the asset, the module 550it resides in, and the overall system is to be estimated. This threat 551model is then considered in the module and system security architecture 552and appropriate countermeasures are defined to mitigate the threat or 553limit the impact of exploits. 554 555In short, the threat modeling process can be separated into these steps 556(adapted from [OWASP]_): 557 5581. Definition of assets 559 5602. Application decomposition and creation of appropriate data flow 561 diagrams (DFDs) 562 5633. Threat identification and categorization using the [STRIDE09]_ and 564 [CVSS]_ approaches 565 5664. Determination of countermeasures and other mitigation approaches 567 568This procedure shall be carried out during the design phase of modules 569and before major changes of the module or system architecture. 570Additionally, new models shall be created, or existing ones shall be 571updated whenever new vulnerabilities or exploits are discovered. During 572security reviews, the threat models and the mitigation techniques shall 573be evaluated by the responsible security architect. 574 575From these threat models and mitigation techniques tests shall be 576derived that prove the effectiveness of the countermeasures. These tests 577shall be integrated into the continuous integration workflow to ensure 578that the security is not impaired by regressions. 579 580Vulnerability Analyses 581====================== 582 583In order to find weak spots in the software implementation, 584vulnerability analyses (VA) shall be performed. Of special interest are 585investigations on cryptographic algorithms, critical OS tasks, and 586connectivity protocols. 587 588On a pure software level, this encompasses 589 590- **Penetration testing** of the RTOS on a particular hardware 591 platform, which involves testing the respective Zephyr OS 592 configuration and hardware as one system. 593 594- **Side channel attacks** (timing invariance, power invariance, etc.) 595 should be considered. For instance, ensuring **timing 596 invariance** of the cryptographic algorithms and modules is 597 required to reduce the attack surface. This applies to both the 598 software implementations and when using cryptographic hardware. 599 600- **Fuzzing tests** shall be performed on both exposed APIs and 601 protocols. 602 603The list given above serves primarily illustration purposes. For each 604module and for the complete Zephyr system (in general on a particular 605hardware platform), a suitable VA plan shall be created and executed. 606The findings of these analyses shall be considered in the security issue 607management process, and learnings shall be formulated as guidelines and 608incorporated into the secure coding guide. 609 610If possible (as in case of fuzzing analyses), these tests shall be 611integrated into the continuous integration process. 612 613Security Certification 614********************** 615 616One goal of creating a secure branch of the Zephyr RTOS is to create a 617certifiable system or certifiable submodules thereof. The certification 618scope and scheme are yet to be decided. However, many certifications such 619as Common Criteria [CCITSE12]_ require evidence that the evaluation 620claims are indeed fulfilled, so a general certification process is 621outlined in the following. Based on the final choices for the 622certification scheme and evaluation level, this process needs to be 623refined. 624 625Generic Certification Process 626============================= 627 628In general, the steps towards a certification or precertification 629(compare [MICR16]_) are: 630 6311. The **definition of assets** to be protected within the Zephyr RTOS. 632 Potential candidates are confidential information such as 633 cryptographic keys, user data such as communication logs, and 634 potentially IP of the vendor or manufacturer. 635 6362. Developing a **threat model** and **security architecture** to 637 protect the assets against exploits of vulnerabilities of the 638 system. As a complete threat model includes the overall product 639 including the hardware platform, this might be realized by a 640 split model containing a precertified secure branch of Zephyr 641 which the vendor could use to certify their Zephyr-enabled 642 product. 643 6443. Formulating an **evaluation target** that includes the 645 **certification claims** on the security of the assets to be 646 evaluated and certified, as well as assumptions on the operating 647 conditions. 648 6494. Providing **proof** that the claims are fulfilled. This includes 650 consistent documentation of the security development process, 651 etc. 652 653These steps are partially covered in previous sections as well. In 654contrast to these sections, the certification process only requires to 655consider those components that shall be covered by the certification. 656The security architecture, for example, considers assets on system level 657and might include items not relevant for the certification. 658 659Certification Options 660===================== 661 662For the security certification as such, the following options can be 663pursued: 664 6651. **Abstract precertification of Zephyr as a pure software system:** 666 this option requires assumptions on the underlying hardware 667 platform and the final application running on top of Zephyr. If 668 these assumptions are met by the hardware and the application, a 669 full certification can be more easily achieved. This option is 670 the most flexible approach but puts the largest burden on the 671 product vendor. 672 6732. **Certification of Zephyr on specific hardware platform without a 674 specific application in mind:** this scenario describes the 675 enablement of a secure platform running the Zephyr RTOS. The 676 hardware manufacturer certifies the platform under defined 677 assumptions on the application. If these are met, the final 678 product can be certified with little effort. 679 6803. **Certification of an actual product:** in this case, a full product 681 including a specific hardware, the Zephyr RTOS, and an 682 application is certified. 683 684In all three cases, the certification scheme (e.g., FIPS 140-2 [NIST02]_ 685or Common Criteria [CCITSE12]_), the scope of the certification 686(main-stream Zephyr, security branch, or certain modules), and the 687certification/assurance level need to be determined. 688 689In case of partial certifications (options 1 and 2), assumptions on 690hardware and/or software are required for certifications. These can 691include [GHS10]_ 692 693- **Appropriate physical security** of the hardware platform and its 694 environment. 695 696- **Sufficient protection of storage and timing channels** on 697 the hardware platform itself and all connected devices. (No mentioning of 698 remote connections.) 699 700- Only **trusted/assured applications** running on the device 701 702- The device and its software stack is configured and operated by 703 **properly trained and trusted individuals** with no malicious 704 intent. 705 706These assumptions shall be part of the security claim and evaluation 707target documents. 708