RTOS for small embedded systems

I cut my teeth on embedded systems under various RTOS: OSE, Precise MQX and various home-brewed RTOS implementations, running on medium sized systems – 16 bit MCUs and DSPs used in mobile telecomms applications, for example. These systems have enough grunt to accommodate an off the shelf RTOS full of features without any consequent bloat ever really becoming apparent.

My focus as an electronics hobbyist has taken me to much smaller systems – 8 bit AVR MCUs, for example. Here I’m typically working with “bare metal”, although I’ll often use the Arduino environment to get me a leg-up into the “interesting” code straight away. I’ll often need RTOS-like features such as timers, message passing and mechanisms to trigger “user land” code from inside an ISR, and there’s almost always the need to generate the illusion of concurrency, responding to front panel buttons while handling RF or serial comms and driving outputs that flash lights and run stepper motors, for example.

A fully featured RTOS can achieve these things so elegantly and simply, leaving you free to devote your attention to the task in hand without having to roll your own timer class for the umpteenth time, for example. Coding tasks under an RTOS also permits, promotes, even dictates, neat and tidy separation of functional blocks into their own self-contained, testable modules, leading to greater potential for code re-use, less risk of introducing errors and many other “good things”, in my opinion.

There is no shortage of full featured RTOS to choose from, of course, and many have been ported to small MCUs like the AVR. Most go way beyond the minimum required to give us the benefits mentioned above, though. If an RTOS leaves out anything, someone will slate it, so they tend to bloat until they are all things to all men, although some are configurable enough to be stripped back to the bare essentials. On a system such as an AVR, however, they still often consume too many resources and running them is a curiosity, with most of the CPU resources taken by the RTOS itself before any application code is created.

We’ve got to paragraph 5 on the subject of RTOS without acknowledging it. That’s pretty good going, but the elephant is certainly in the room, so let’s bite the bullet and deal with it: Preemptive or cooperative?

I understand why most RTOS developers adopt, or at least support, the former. Writing a preemptive task scheduler is a fascinating programming exercise in itself. I might go as far as saying that one can’t really be considered a fully experienced embedded software engineer until you’ve tried it, so, if you haven’t, I’d really recommend it as a challenge. Only by doing so will you fully appreciate the pros and cons of the above question, the full impact of preemptive multitasking on a system. Then I’d recommend that, for most applications on small systems, you should bin it and go co-operative instead!

Don’t get me wrong – preemptive multitasking is fantastic in the correct scenario. I’m very pleased that the Linux desktop on which I’m writing this article features it, for example. For large, complex systems incorporating multiple software projects under the control of different teams, it’s vital for providing acceptable performance on any scale, let alone meeting deterministic real-time performance requirements. For a small system with only a handful of tasks, and little enough code that it’s within the radar of one developer or close-working team, it can be real overkill, though.

Worse still, on some systems it can be a very bad idea. In the 90’s I was leading a project to develop the firmware for a GSM signalling card for a piece of test gear. We were using a number of DSP devices to implement the layer 1 and went looking for a suitable RTOS upon which to build the DSP part of the system. Every commercial offering we could find touted “fully preemptive” as a selling point.

A DSP processor of the type we were using (Motorola 56300 series) has not only a large register set to save and restore during a context switch, but it also relies heavily on internal pipelining for its performance. Having to unwind all of this and lose the context that has built up in the pipeline just to swap tasks at some arbitrary point of the kernel’s choosing makes no sense. In addition, we had shared memory that had to be arbitrated for with every access. We would have had to lock out the RTOS kernel during most of the heavy lifting code, so what would be the use of wasting memory on a stack for each task and all the code to make a preemptive RTOS tick when it’s really only controlling the housekeeping code surrounding the “main event”?

Using a pre-emptive RTOS also requires all of your developers to be aware of the pitfalls of multi-threaded execution at all times, and can make debugging the system much more challenging.

We ended up rolling our own non-preemptive kernel that was lightweight, fast and easy to use, and it went on to be used in a number of other projects once we’d “shared the love” with other development teams.

So, what does the minimal RTOS for small AVR device look like? Something that gives all the convenience of developing under an RTOS but without too much unnecessary bloat. I’m going to be working on that idea here