WEBINAR:
Pushing Microcontroller Firmware Updates Over-the-Air Without Bricking Your Device
WATCH ON-DEMAND NOW
Opens in a new window.The Internet of Things is at a critical juncture. Repeating past mistakes—specifically those related to security and maintenance— will slowly but surely erode confidence in IoT, and its ability to contribute to addressing the myriad of problems that both businesses and the environment face. It’s time to change the way we work on embedded systems.
The paper is organized into two parts.
Part I addresses a broader audience, such as IoT product/project managers and CTOs. It lays out a typical device-side IoT architecture and describes the traditional approach of implementation. It details the associated challenges and develops an argument for a different approach, now made possible through new hardware advancements.
Part II addresses the experienced embedded engineer and explains Twilio’s thinking with regard to how the above-mentioned challenges can be effectively addressed with a new architecture.
Part 1 – The Challenges of Connecting Devices
Connected devices vs. unconnected devices
Microcontrollers have been used in products for many decades, and have revolutionized product feature sets, reliability and performance over time. Moore’s law has brought 16- and 32-bit processing to even the smallest and cheapest consumer products, and the availability of this memory and CPU power has enabled the use of real time operating systems (RTOS) where previously developers had to write “bare metal” code.
However, the transition from unconnected to connected products—in the context of IoT—has uncovered fundamental issues with how software is built for microcontrollers.
Connected device architecture
For IoT devices built around microcontrollers, a typical high level system architecture might look something like the diagram shown here. On the hardware side, there’s a microcontroller connected to both networking hardware (Cellular/Wi-Fi/Ethernet) and to the application hardware—the sensors and actuators used by the IoT application.
In order to manage resources and tasks, an off-the-shelf RTOS is typically used. There are many choices here such as FreeRTOS, NuttX, ThreadX, and from a high level they all perform the same tasks—allocation of both memory and processor resources to different tasks within the system. To help decouple higher software layers from the specific hardware involved, there’s usually also a Hardware Abstraction Layer (HAL) which may be built into—or sit alongside—the RTOS, taking care of the actual hardware accesses to perform I/O.
Connected devices also need a network stack, typically providing TCP/IP networking. The bottom of the stack talks to the network hardware to exchange packets, and the top of the stack provides stream and datagram APIs. On top of this is layered the security stack, to provide authentication and encryption services used by both cloud communications and FOTA (Firmware Over-The-Air update) services.
At the very top, there’s the application, implementing the specific functionality of the device at hand. This talks to the application hardware and the system services and additionally takes care of cloud communication.
Usually, the stack has been integrated by the device maker:
Integration & maintenance challenges
Some pre-integration often exists—for example, Arm provide HAL packaged releases with Mbed OS, network stack and security stack, and Infineon/Cypress provide FreeRTOS, lwIP and Mbed TLS as part of their WICED platform (Wireless Internet Connectivity for Embedded Devices, a platform to enable Wi-Fi and Bluetooth connectivity in system design). Yet the design decisions made by these integrators do not always line up well with the application requirements, resulting in heavy developer customization. That, in turn, comes with the additional complexity of having to merge new releases from the supplier with the existing code base.
Merging changes from suppliers is a requirement to maintain system stability and security over the long term—and in IoT deployments that can mean a decade or more—, especially as these packages usually include code that is directly network-facing, which is easiest for an attacker to target. While some vendors provide long term support branches (“LTS”), which retain API compatibility for essential security updates, the definition of “Long Term” is often not compatible with a product’s lifecycle. For example, Mbed TLS has LTS releases which offer security updates without API changes for up to 3 years. But beyond that, the developer would need to integrate a possibly radically different API to maintain a secure product—or heavily compromise a product’s security by continuing to rely on out-of-date code.
As always, the more software you’re writing or integrating, the more maintenance you will have to perform on this code over the product’s entire lifecycle. Whereas an unconnected product might comprise 90% application code and 10% third-party code (and ongoing maintenance isn’t required as physical access would be required for any attack), connected products are often 20% application code and 80% third-party code, all of which has to be maintained to protect the user and manufacturer’s reputation.
Security design
Besides maintenance, there’s a very real problem with both design and implementation of security components. As with any specialist field, there’s a lot of expertise required to make the correct trade-offs and design decisions when building a connected product—and people with the appropriate skills are rare and hence expensive to hire.
When areas of the product are being architected from scratch— especially parts which may not be serviced adequately by well- supported open source software—the risks associated with a subtly-flawed design decision could be significant.
Value and cost predictability
As can be seen in the architecture diagram, there’s a huge amount of software required to build a secure connected product—and most of it does not depend on the application itself. Not only is the time and money spent on integrating and maintaining external components a huge burden to a product’s lifetime costs, it is also essentially invisible to the end user, and doesn’t differentiate the product in the market.
Millions of engineer-hours have gone into reinventing the “connectivity wheel” for every single IoT product that has ever shipped. Complexity, budgets, schedules and lack of relevant domain knowledge has also meant that many of these products suffer from latent security issues just waiting to ruin someone’s day.
As noted, one of the major challenges with solving the maintenance issue in an MCU design is the close integration between the RTOS and the application. Larger systems such as desktop computers and mobile phones have always had an OS/application split, with the platform supplier, e.g. Microsoft, maintaining the operating system & network stack and providing updates over time to keep it secure.
So, could these problems be addressed with a similar OS/ application split applied to embedded systems? There are three issues that crop up:
Responsibility for updates
When compared to a desktop or mobile application, embedded IoT applications are vastly different. Most desktop applications—and almost all mobile applications—are human-centric, providing a service or function to the user via processing and connectivity provided by the host device. As such, performance and consistency are appropriate for humans; the user interface might change, a screen might take a couple of seconds longer to appear, or functionality may be degraded if connectivity is not available—but humans quickly adapt.
In comparison, an embedded IoT application is I/O-centric and may have non-negotiable performance targets—whether these are for response time, functionality in the event of degraded communication, or power consumption. These targets depend on the specific use case of the device.
This different set of developer expectations, coupled with the reality that updates have to be deployed to unattended devices that may be physically entirely inaccessible for their lifetime, result in a very different burden on the shoulders of whoever maintains the devices.
Essentially, the developer needs to have confidence that no third party updates will ever break the deployed application. There are two ways the maintainer can help relieve developer concerns:
Just as hardware developers are intimately aware of DFM (Design for Manufacture, a set of practices that help products move smoothly from prototype to production with high yield and minimal field failures), software developers are aware of DFT (Design for Testability).
In the world of long-lived IoT products, consistent testing over long periods of time is essential. This means that testing must be automated vs. manual, as people working at an organization will change over time. As such, at a minimum, OS and networking code must be defended with a full suite of automated tests: from build-time unit testing to system testing on target hardware, to regular fuzz testing of external interfaces in order to uncover unintended behaviors. This level and duration of DFT and test automation is obviously expensive.
Complexity of development and the impact on cost and BOM
Just as developers get comfortable with a particular Instruction Set Architecture (ISA), they also get expertise in an operating system architecture, a set of development tools, and development & debug workflows. Changing any of these components—even for tangible long-term gains—is painful in the short term and can introduce uncertainty in project schedules.
In an ideal world, a developer would be able to continue to use their preferred tools and RTOS while still having someone else provide the essential maintenance for long-term support.
One advantage of linking the operating system with the application is that it becomes easy to only pull in OS code that the application actually makes use of. This reduces the footprint of the OS and hence reduces the overall memory usage of the product.
Adding any functionality to an embedded system—even reliable FOTA—does increase the hardware BOM cost, mainly related to flash and RAM usage. Unfortunately, there’s no real way around this, but the upsides are significant and the incremental costs are generally small, especially when compared to the cost of application development.
Maintained microvisor vs. maintained operating system
It’s clear from the sections above that attempting to provide a single maintained RTOS for a wide variety of applications is only going to be successful for a subset of developers—those who are already familiar with the chosen OS, and those who do not rely on custom modifications to that OS.
If, however, we approach the problem from a different angle and instead look at what services we are trying to provide to the embedded developer, a new solution appears: a hypervisor that runs alongside the developer’s RTOS and application code and insulates it from common attack vectors. Let’s call it a microvisor.
The areas which are security-related and hence require long-term maintenance are:
With appropriate hardware support, such a microvisor can be built—one which claims the necessary peripherals for network support at boot time, and establishes an application-independent connection to the cloud service that provides FOTA updates. But aside from that, it stays largely out of the way of the developer’s application and choice of RTOS.
The microvisor can then protect itself from attack, whether it be via hardware tampering or a network interface. It can also protect the developer’s application from a large variety of attacks.
In the second part of our whitepaper, we are taking a deep dive into what a microvisor architecture looks like; in particular, how Twilio is approaching the design with Twilio Microvisor. We'll cover topics such as peripheral usage, memory, networking, interrupt handling, exception handling, FOTA upgrades, powermanagement, and more.
Sign up now for access to Part II and the full PDF version of our whitepaper.