Keep it simple: avoid over engineering

A perfect design is an enemy of a good design. Often, designers striving for a perfect design may end-up with no design at all due to schedule and cost overruns. A simple design may not provide best solution to a given problem but it would probably have the best chance of meeting the schedule and cost constraints with acceptable quality. Also, a simple design is easier to implement, maintain and enhance.

It is better to start with a simple design, making reasonable and rational compromises during the design phase. This will save you from making unreasonable compromises in quality when faced with looming product delivery dates.

Here are some of the "reasonable and rational" design simplification techniques:

Use lookup tables for complex decision making
Use fixed size arrays whenever possible
Avoid dynamic memory allocation
Reduce the number of tasks in the system
Avoid multi-threaded design
Optimize the design only for the most frequently executed scenarios
Sometimes searching might be more efficient than hashing
Use state machines to simplify design
Use timestamps to avoid running timers
Use object oriented programming
Avoid design hooks for future enhancements
Avoid variable length and bit packed messages
Reduce message handshakes
Simplify the hardware architecture
Prefer general purpose computing platforms over specialized platforms
Do not use proprietary protocols/operating systems
Prefer buy over build for software
Prefer buy over build for hardware
Prefer designs that lead to more reuse
Avoid a heterogeneous hardware and software environment
Consider hardware upgrades to reduce software effort
Minimize configurable system parameters
The "0 or 1 or n" Rule

Use lookup tables

Many times complex decision making code with cascades of if-then-else statements might be replaced with a simple lookup table. A lookup table would also be easier to understand and modify compared to complex and convoluted if statements.

Consider the following example, where the system has to support three terminal types with each terminal type supporting different features. See the following code to see the simplification brought about by lookup tables:

Checking for terminal service types

Terminal service type check using lookup

Use arrays

When managing multiple entities, the simplest possible data-structure is arrays. Consider a case where a certain processor has to support a maximum of 1000 user terminals. In this case, we recommend using an array definition for 1000 user terminals. This might seem like a waste of memory when the system is actually handling only 100 terminals, but do you really have any use for that memory when the system is running at 10% of its rated capacity. A design where the system just allocates memory as and when user terminals are added, will be far more complicated to implement. A fixed size array implementation will be simpler to implement and it will probably be more efficient.

Avoid dynamic memory allocation

As far as possible, dynamic memory allocation in Realtime systems should be restricted to message communication related operations. Many times dynamic allocation of objects and structures adds needless complexity to the design. Also with dynamic memory allocation, memory leaks can plague the system for a long time. If memory is allocated statically the design remains simple and free of memory leaks. The designer is freed from thing about when to allocate/free memory and can focus on the core design.

Reduce the number of tasks

Try to reduce the number of tasks in a Realtime system design as it reduces inter-task message interactions. It also eliminates task scheduling delays. Reduced message interaction simplify design as inter-task message handling is replaced with local function and method invocations. The direct result of this is reduction in the complexity of the state machines.

Avoid multi threaded design

Scheduling of actions in real-time systems is controlled by state machines. Typically state machines do not call blocking OS primitives while executing a state machine. In such a design, multi-threaded design is of little value. Multi-threaded design complicates the design by introducing synchronization primitives to access resources shared among different threads. Multi-threaded programs are harder to debug than their single-threaded counterparts.

Optimize only frequent scenarios

Though a programmer has to design a feature for every possible failure and glare scenario, their frequency of occurrence in real-time is low. Thus the designers should focus on optimizing the design for the frequent success scenarios. There is no major impact on system performance if the error and glare scenarios are implemented in a simple but non-optimal way.

Consider the call handling in a switching system. Here, failure of a digital trunk might be handled by simply searching through all the call objects for the switching card and initiating call clears. This could have been implemented more efficiently if the digital trunk object maintained the list of all of the active calls on that digital trunk. But this failure scenario would not occur frequently. So, the extra complexity of maintaining a list of calls on a per digital trunk basis is not worth the effort.

Searching vs. hashing

Many times Realtime systems need to maintain hash tables and maps to quickly search for an object. Maps and hash tables provide efficient way to dispatch messages. However when the number of entries in small, a simple search in an array might be more efficient than incurring the fixed overhead of a hashing algorithm.

Use state machines

Proper use of finite state machines in real-time design avoid unnecessary use of large number of flags and cascaded if statements checking for those flags. We strongly recommend using state machines as they simplify the code and also make it easier to understand the feature flow. See the article on hierarchical state machines for details.

Use timestamps

If there is a requirement of keeping track of long time durations, it is preferable to avoid timer management overheads. This could avoided by taking a time-stamp for the timer-start instant. Then, the time-stamp should be subtracted from the current time to achieve the same effect as timer expiry. This technique is illustrated in the code for handling leaky bucket counters.

Handling Leaky Bucket Counters

Use object oriented programming

Design object oriented as it provides the simplest technique as it maps directly from the problem domain to objects. This simplifies the design cycle as the design actually mirrors the problem domain. Due to the same reason, object oriented designs are simpler to understand and maintain for new developers.

Avoid design hooks for future enhancements

It is advisable not to introduce additional complexity in the name of future design hooks. More often than not, these hooks turn out to be a liability than an asset for the future designers as they might be forced to accept the design based on these hooks. Thus avoid adding design hooks for the future, as they will just add additional work that you don't need to do now and will be of little help to future designers.

Avoid variable length and bit packed messages

Avoid using variable length messages as they add complexity to message interface code. Use of variable length fields also forces you to abandon C- structure level definition of messages. This is additional work that you can do without.

Same reasoning applies to bit packed messages. Do not define bit pack messages unless mandated by some external protocol that your system supports. Bit packed messages are a constant source of bugs and they lower system performance by wasting valuable CPU cycles in message packing and unpacking.

Reduce message handshakes

When designing message interactions, avoid multiple interactions between two processors. Consider the following example:

Switching card requests a space slot from the Central card
Central card allocates the space slot to the Switching card and replies with a message.
Switching card now requests the Central card to route the call
Central routes the call and replies to the Switching card.

The above interaction could be simplified as:

Switching card requests a space slot and call routing
Central allocates a space slot and routes the call. Central replies back to the Switching card.

As you can see above, the four message handshake has been reduced to two messages only. This reduces one state in the Switching card handling of the a call.

Simplify the hardware architecture

The first step in keeping the software design is keeping the hardware architecture simple. Software system architects should work closely with the hardware team to simplify the hardware architecture of the product. This might involve:

Reducing the total number of processing nodes in the system.
Simplifying the interconnections between different nodes.
Using off-the-shelf interconnections technologies like Ethernet and TCP/IP.
Using standard PC hardware to implement your system.

Simplifications in hardware directly translate into simplifications in software architecture. For example, reducing the number of nodes in the system will reduce the number of releases and teams that you need to support. Using off-the-shelf interconnection technologies like Ethernet and TCP/IP will open doors to use of off-the-shelf components like routers and switches.

Prefer general purpose computing platforms over specialized platforms

Use a special purpose computing platform only after you have exhausted all possibilities of using a general purpose platform. For example, if your application requires signal processing capabilities, consider if the performance goals can be met by a general purpose PC platform without using Digital Signal Processors (DSPs). General purpose processors might support specialized instructions that might bring them at par with specialized platforms like DSPs. e.g. the Intel Pentium processors support SSE, SSE2 instructions that can handle really complex signal processing tasks.

There are several advantages of sticking to general purpose platforms:

Low cost software and hardware development tools.
Its easy to find people with skills in using general purpose platforms.
General purpose platforms have much higher market volume so they are often an order of magnitude cheaper than specialized platforms.

Do not use proprietary protocols/operating systems

Many embedded systems use home grown protocols and operating systems. This leads to additional cost to maintain the associated software. Use of standard protocols and operating systems lowers cost and improves stability of the product, as standard products have been subjected to rigorous testing by countless systems. Proprietary protocols/operating systems often cost a lot more due to need to train developers.

Prefer buy over build for software

When starting a new software project, consider if you could buy off-the-shelf software modules that can be directly integrated into your system. This saves you time as you can focus on your core application development, without having to worry about standard parts like device drivers, external protocols. When looking for external software modules, consider the following factors:

Software you are purchasing is from a reputed company.
Software is supplied along with the source code.
There are no runtime royalties and per product royalties.
The maintenance contract includes clauses on turn around time for critical bugs reported to the company.

Keep in mind that most of the time you can buy software at fraction of the cost of building it in house.

Prefer buy over build for hardware

Using a off-the-shelf hardware will save you a lot of time and money. The software team would get the final target hardware on day one of the project. With proprietary hardware, the software team has to invest a lot of development effort in building a test environment on the host machine and finally porting the software when the hardware becomes available.

Prefer designs that lead to more reuse

When developing a hardware and software architecture, prefer designs that will reuse already developed software and hardware modules. The future reusability of the software and modules should also be a factor in choosing new architectures. Avoid "lets start with a clean slate" approach to developing systems. New projects should build over the results of previous project. This lowers cost by reducing complexity of the system being developed.

Avoid a heterogeneous hardware and software environment

When developing a new system, use same or similar hardware platform in most of the modules. This will save you the trouble as you do not need to worry about byte alignment and ordering and other interoperability issues. Using the same operating system will reduce development effort as you can use common utilities on all modules in the system. Use of same operating system will also save you money and development effort as same set of development tools can be used by all developers.

As a corollary to this, consider using the same platform for developers workstations and the final platform. This will allow developers to use their workstations for testing. You would not have to worry about cross development tools and other costs associated with having a different target platform.

Consider hardware upgrades to reduce software effort

Many times you can reduce software complexity by using higher performance hardware platforms. For example:

When faced with a performance crunch, often updating the hardware might be the cheaper option when the costs of developing and maintaining the software optimization are factored in.
Two low performance nodes in the system can be merged into a single high performance node. This will simplify the software design.
It is cheaper to fix the interconnection bottlenecks by updating to the next generation interconnection technology. For example, if a particular link in the system is exceeding the capacity of a 100 Mbps LAN, the cost of using a 1 Gbps LAN will be lower than the cost of optimizing the inter node protocol between the nodes causing the performance bottleneck.

Often, system designers consider software to be "one time" cost and hardware is considered to be a recurring cost. The designers fail to see that software development and maintenance costs are recurring and will most likely increase with time. On the other hand, hardware recurring costs will go down with time.

Minimize configurable system parameters

Minimize the number of configurable parameters in the system. This will simplify your system configuration and it will save you coding effort involved in adding support for change of these parameters.

In our experience most configurable parameters rarely turn out to be useful after system deployment. But these parameters are a big source of system integration and deployment problems. There are countless horror stories on how wrongly configured parameters brought the system down. There are very few cases where a configurable parameter saved the day.

Parameters should be made configurable only if they meet any of the following criteria:

Site specific tuning is required, i.e. a parameter needs to be assigned different values on different sites.
Operator might need to change the parameter several times during the lifetime of the system and it is not practical to build a software release to make the change.
The product needs to operate in different modes and perform different functions and configurable parameters are required to control this.

We have excluded engineering configurable parameters from the above list. All parameters that can be finalized during initial field testing should not be made configurable. Making such parameters configurable serves no purpose. Very rarely you will encounter a problem that can be fixed just by changing a configurable parameter. Even if you encounter such a problem, the chance that the parameter is actually configurable in your system would be remote.

The "0 or 1 or n" Rule

This rule simply states that when designing a system, every feature should be passed through the following sieves:

Sieve 0: Do we really need this feature?
Sieve 1: OK, we do have to support this feature, but lets support only one instance of this feature.
Sieve n: Looks like multiple simultaneous interactions of this feature need to be supported. Lets design to support "n" instances, even if we need to support just two simultaneous instances.

Try to drop features at Sieve 0. If the features passes through Sieve 0, try to contain the feature in Sieve 1 and implement only one active instance of the feature in the system. If the feature survives Sieve 0 and Sieve 1, implement support for "n" simultaneous instances of the feature in the system.

For example, lets consider the design of a mobile terminal. The marketing folks request that the mobile terminal should support "broadcast sessions" for value added services like TV over a mobile phone. This feature request is first passed through Sieve 0 to determine if this is really needed for this terminal. If the features passes through Sieve 0, consider if the mobile terminal should only one broadcast session. If this meets the requirements, implement a system with just one broadcast session. This will be a lot simpler than supporting multiple simultaneously active broadcasts. If you can't get an agreement on one broadcast, go ahead and implement support for "n" broadcast sessions, even though the marketing folks are prepared to testify under oath that they will never need more than two simultaneous broadcasts.