RTOS for small embedded systems – part 2

Let’s think about what features a small footprint, non-preemptive RTOS for something resource-limited like an AVR might require..

Tasks

Any real-time system requires its software to be partitioned into modules that execute when required to perform (ideally just) one task in a potentially very complex system. It rapidly becomes unmanageable to support a large blob of code that is trying to simultaneously achieve several time critical objectives of different kinds, so tasks are our friend, allowing the RTOS to absorb the headache of making a single-threaded CPU appear multi-threaded, in turn allowing the developer to focus on one objective at a time to the exclusion of the others.

A task will typically have a state, which determines what the scheduler should do with it. A minimum set of task states would typically be Idle, Ready and Running.

A task will also have a broader application specific set of state variables, important because, in a cooperative system, each time it enters the Running state, the task will have to pick up from where it was previously executing. Because it has no dedicated stack or CPU context, the task must store this information in a set of state variables attached to the task itself and the RTOS needs to pass it into each iteration of the task, since there may be more then one instances of the task in a system.

Kernel / Scheduler / Executive

Whatever you choose to call it, any RTOS generally has a central core of code whose responsibility is to maintain a list of ready tasks and to determine which task is next to run in order to best achieve the real time objectives of the system.

To achieve this it requires an insight into the relative importance of the tasks in the system from a perspective of its real-time performance, and, of the state of each task in terms of whether it has been made “ready” to run by the various thread synchronisation mechanisms supported by the system, or whether it is “blocked” by virtue of having no input to process or is waiting on a timer that is yet to expire, for example.

Prioritization is normally achieved at the basic level by assigning a static numerical priority level to each tasks. More advanced implementations might allow this priority level to be enhanced or reduced in response to dynamic conditions.

Another very important core function on a small system, which may be battery powered, is to identify when the whole system has fallen idle (with no tasks “ready” to run) and to respond by entering a power saving state that is appropriate to the state of the system at the time. Platform drivers need to be provided which respect the state of peripherals such as ADCs and timers which may be busy, choosing a sleep state which shuts down as much of the system as possible to save power.

Messages and Events

Multi tasking systems, especially those implementing communications protocols, soon develop a flow of information that cascades through several tasks on its way through the system so our first requirement is for a task, or an interrupt service routine, to be able to trigger the next task, and possibly to pass some information.

There may be no requirement to transfer a payload at all, but just a signal that indicates that an event has taken place, potentially waking up a task to handle it. In fact, transferring a payload implies the copying, or allocation and deallocation of blocks of memory in which this information is passed. On a small system, we might consider the overhead associated with this as overkill, so this is probably something to be regarded as an optional feature of a small RTOS. Often, it’s enough to simply trigger the Kernel to run a task, a mechanism which will be extremely fast and require no additional resources other than the kernel and tasks themselves.

Where there’s only the requirement to signal that an event, such as expiry of a timeout, for example, has occurred, we’ll refer to this as an event. Often there’s a requirement to preserve the order and number of events rather than simply trigger the task, in which case we’ll need to queue the events and assign them IDs.

Ownership of memory is always a contentious issue in multi tasking systems, so often we pass on a message payload, along with the responsibility of freeing the memory in which it resides, from the source task to the recipient. Alternatively, the message can be copied into a FIFO queue that the recipient task “owns”, avoiding the memory ownership issue, but, adding the overhead of a copy into and out of the FIFO. In either case, we’ll refer to this notification incorporating an information payload as a message.

Timers

Timing is always important in a small real time embedded system. We soon come across the requirement to delay for precise periods of time, to implement a timeout that guards a system against an expected event not happening in time, to flash a front panel LED and so on. Frequently a small microcontroller of the type to which we’ve been referring has only a single, or a limited number, of hardware timers, so some means to share a timer of this type across multiple tasks is useful.

Often, a system can just maintain a system tick, and tasks can poll this to determine when the required intervals have elapsed, but this requires the tasks to run in a round-robin style when the system is idle, and this, and the maintenance of the system tick itself, can be wasteful of CPU cycles and battery on a power critical system

A better solution to the timer problem is to have a system timing module that can manage a queue of any number of timers, allow tasks to start, stop and reset them, and to deliver events to tasks when they expire. Such a module can hang off a single CPU timer module and can even wake the system from sleep when a timer expires.

A system with a central timer module can often run in “tickless mode”, where there is no requirement to maintain a system tick, so the system can fall completely idle with no CPU overhead when it’s awaiting a timer expiry. Each time the system sleeps, the timer module knows when the next timer, if any, is due to expire, so it can simply wake itself on the hardware timer, however long that is set to run for.

Central timer support is, again, likely to be regarded as a luxury in a really compact system, so should also be regarded as an optional component of an RTOS.

Storage Pools

Most embedded systems have relatively constrained RAM and very simple heap management. This means that it’s generally a spectacularly bad idea to use the C library’s heap functions – malloc(), free(), new() and delete() – for any dynamic memory allocation during runtime. This is because, if small chunks of memory are constantly being allocated and freed, all for variable periods of time, the small heap on an embedded system will soon become fragmented, meaning the free areas of memory are small and numerous and it becomes impossible to allocate larger “chunks” of memory despite sufficient available heap space.

More complex systems with virtual memory or running managed code can coalesce blocks of allocated memory but we have no such luxury in most embedded systems. In addition, heap allocation and garbage collection functions that perform management of the heap on the fly often don’t permit deterministic timing performance, making them unsuitable for hard real time systems.

Passing information in message blocks is useful, however, so we need to find a more effective way to manage allocation and freeing of memory blocks.

One neat solution is a storage pool, where we decide on a number of block sizes that coincide with the messages we use in the system, then we decide how many instances of each block size the system is likely to need during normal operation. We allocate (statically or from the heap) a large block at system startup, then subdivide it into the blocks we need, and store blocks of each size in a pool. Blocks can then easily, quickly and with deterministic real-time performance, be allocated and deallocated using a few pointer operations without fear of any heap fragmentation.

Profiling

Most embedded systems have real-time constraints of some kind. Often they are hard real-time systems where, if certain timing constraints are not met, the system becomes unusable. The most obvious of these is a clock, of course. If the firmware in a clock can’t keep accurate time, it’s useless. Automotive engine management systems are another example, where, if the ignition drivers on an engine aren’t fired at accurate intervals down in the microseconds, the engine will run poorly or even be damaged.

Meeting defined real-time performance objectives requires detailed knowledge of the timing performance of all parts of the system while running on its’ target platform. This is especially true of cooperative real-time systems, where the kernel can’t just immediately context switch to a high priority task when it’s triggered. The system latency is defined by all tasks and ISRs in the system, even the really low priority ones!

Achieving defined real time performance, therefore, requires timing performance metrics to be gathered from a running system in order to understand its real time performance and verify that critical parts of the system will meet their timing constraints in the worst case scenario. Often such metrics indicate parts of the system where performance needs to be improved, or where time consuming operations need to be split into several passes of a task to reduce the latency with which a high priority task may run.

Profiling, as we’ll refer to it, can be achieved using various ad-hoc methods, such as toggling an IO line either side of the code to be profiled and measuring the timing with an oscilloscope, but, often it’s more convenient to have a software module to achieve this. If the system has a tick count or timer with reasonable resolution, a profiling module that can record the minimum and maximum latency of each task, ISR and, sometimes, other software routines, is invaluable. This in itself is an intrusive operation, of course. You’d normally want to disable this in production code, in fact, the system may not meet its’ real time performance requirements with a profiler enabled, but it’s a great tool for proving that it will in a production release!

Wrapping it up

That’s probably all we need for a basic RTOS, and we’ve probably done enough talking. I haven’t mentioned semaphores and mutexes yet. Often, in a cooperative system, they aren’t required, since managing shared access to resources is less onerous anyway, and, in fact, they are probably just a superset of the features we’ve already defined. Maybe we’ll build these on top of the basics later.

One type of mutual exclusion mechanism required even in simple cooperative systems is to lock out interrupts and thus ensure that critical code can’t be called in a re-entrant manner. It’s often useful to wrap up control of global interrupts in a class that can manage nested entry to the exclusion and automatic re-enabling of the interrupts when the top level instance loses scope.

To be continued…