Application-Driven Design: ASIC, SoC and MPU

Beyond Drivers: The Critical Role of System Software Stack Architecture in SoC Hardware Development

New system-on-chip designs require major software efforts from internal operating system and interface issues on up to specialized on-chip device functionality. Ultimately, the software must make the hardware work. Getting there requires clear vision and close cooperation between hardware and software teams.


  • Page 1 of 1
    Bookmark and Share

Article Media

It’s no secret in the semiconductor industry that software development costs for a new system-on-chip (SoC) can exceed the hardware development costs by a significant margin. Having been directly involved in the software side of the SoC development process, I’ve experienced in detail the overall development flow, which gives me the courage to try and answer the following questions: “Why is there so much software to develop? Android and other operating systems exist, they all have an abstracted hardware interface, so isn’t it just a simple matter of a few “drivers” to link up the new silicon with the OS?” If only it were so simple.

Well, the bottom line is that it’s all about the hardware. All the software effort, from writing the lowest level driver to building the coolest multimedia Android app, is driven, and potentially exacerbated, by the underlying hardware capabilities and their impact on the software developers on the SoC team.

To get a feel for the magnitude of the software effort for a new SoC, here’s a composite picture gleaned from projects I worked on not too long ago. A typical project might have 500+ software developers overall, with most being devoted to operating system development and customer support, with at most 100 developers for kernel porting, bring-up and testing. These projects typically take 48 months from start to finish for a mainstream, complex, mobile device SoC. If the SoC is new and not a derivative of a previous SoC, it can take much longer than 48 months to complete, especially if there is a process node change. It would be considered a success if any project completes in a firm 48 months, based upon only incremental changes being made to the SoC, with more aggressive parallelization of hardware and software during development.

Clearly, these software development projects are indeed large in scale. Is there anything that can be done to change this situation? In a previous real-time clock (RTC) article, we discussed the criticality of providing software developers with a realistic and usable (good performance) platform upon which to run software before silicon, to enable parallel hardware and software development. Here we assume that all that technology is already in place. So now let’s attack the issue of what’s currently “holding up” the development of software for the new SoC, and why there are so many software engineers. Because the answer touches on many aspects of software support for hardware, it’s worth taking a closer look at what’s going on.

It all begins with the need to support operating systems such as Android, Linux and Windows 8 with the digital signal processor (DSP), imaging, graphics processing unit (GPU) and other hardware subsystems on the SoC. In short, it’s the issue of offloading software functions into hardware for performance gains or to lower power consumption. The most common form of offload is moving a particular software capability into the underlying hardware of the SoC. However, given the ubiquity of wireless communication and the Internet, there is an emerging offload architecture based upon moving the offload function up into the Cloud. In fact, some architectures can decide on the fly whether to use device-based offload or Cloud-based, optimized around the best power savings, compute time, or some other user-selectable benefit. But no matter where the offload is happening, the implications on the software are what we need to understand.

For example, a DSP subsystem on an SoC can support a wide range of audio processing functions, including audio stream coding and decoding, voice processing, equalization and many other capabilities. These capabilities are implemented as a combination of hardware (the DSP) and extensive software libraries. These capabilities are typically independent of any particular OS environment. Thus the “usual” notion is that “software drivers” will need to be developed, either by the SoC maker itself, or by the customer, in order to interface the DSP hardware and software audio subsystem to the operating system that the customer is using. This notion corresponds to the typical, but oversimplified, layering diagram of a system, where there is a hardware layer, a driver layer, an operating system layer and an application layer. In this model, all the hardware maker has to supply is an OS-compliant driver, and the hardware is then supported all the way up the software stack for apps to use. If only this were true!

The reality is that this simple model in many cases doesn’t reflect reality at all. For example, see Figure 1 for a system architecture diagram of Android. Note that although there certainly is a driver layer, there are a couple of intermediate layers with multiple components before reaching the application layer, all of which may have some dependencies on the underlying hardware. Also keep in mind that this software stack consists of many millions of lines of code, which need to be understood by software engineers who didn’t write it in the first place. This is not the environment in which to trivialize the challenges of modifying the software stack.

Figure 1
Android architecture and hardware-related intra- and cross-layer software activities. (Source: Google)

With this complexity in mind, it is important to note that a number of popular OSs have either unique and/or limited interfaces available to integrate support for DSP or other hardware, into the existing system frameworks. Imagine that the multimedia framework developers designed the framework to be largely software-based, with minimal interfaces to make limited use of hardware acceleration for various multimedia functions. So even if an SoC has a DSP on-chip, as far as the media framework is concerned, most of the hardware capabilities are unreachable; in effect, they don’t exist.  

See Figure 2 for an illustration of this situation. Note that although the decoding capability of the DSP is used, all the other audio functions are performed on the application processor, even though the DSP might well be able to perform those functions with much greater power efficiency. Note also the back and forth movement of data between the DSP and the application processor for decoding. That data movement uses power, and of course, the application processor needs to be powered on as well.

Figure 2
Android Audio Playback Baseline DSP Offload. (Source: Cadence)

In order to fully exploit the DSP to offload more of the audio function from the application processor, the hardware vendor can re-engineer the media framework to fully support their DSP, which is the optimal way for the system software to make full use of the DSP offload capability. See Figure 3 for an illustration of an advanced DSP offload architecture. In this case, almost all of the audio processing is offloaded to the DSP subsystem, allowing the application processor to be powered down with the resultant savings in power.

Figure 3
Android Audio Playback Advanced DSP Offload. (Source: Cadence)

The hardware vendors are faced with the task of re-engineering the media framework to fully support their DSP or leave that effort to the customer, but in either case it’s the only way for the system software to make full use of the hardware capability. And, of course, while implementing this offload capability, the developers have to make sure they don’t “break” any of the application interfaces to the media framework, otherwise the apps won’t work. This effort no doubt calls for multiple man-years of work, and likely has to be re-visited each time the framework is revised. In addition, it requires expertise in at least two domains, system software (OS internals) and signal processing—both hardware and software. But the benefits are clear. The system can deliver the same level of audio processing at a small fraction of the power required if the processing remained on the application processor.

The key takeaway here is that while many hardware-dependent functions are contained within a single layer, device drivers for example, others functions are not. For example, as Figure 1 shows, and as we just discussed in detail for audio processing, adding a new framework, power optimization or performance tuning is vertical. To support the effort, software needs to be written or modified at every layer.

Figure 1
Android architecture and hardware-related intra- and cross-layer software activities. (Source: Google)

We might conclude that developing interface standards is the way to solve this kind of situation, and indeed it can be. But as we’ll soon see, there can be interesting and unintended consequences with that approach.

Taming the Interfaces

For example, with GPU offload, we see a different situation than in the multimedia framework discussion. Here, the industry has been working for some time to have standard offload mechanisms in PC and mobile platforms to take advantage of the large raw compute power of GPUs, especially for things closely related to graphics. These include processing with floating point because the hardware is there, and imaging because some parts of the pipeline can be applied. These mechanisms include OpenCL, AMD’s Heterogeneous System Architecture (HSA) consortium, Google Renderscript and Filterscript, and a number of other initiatives. While some may hope that the GPU is “The Universal Offload Engine”—i.e., you only need to support GPUs and all your energy and throughput problems are solved—as usual, the reality is more complex.

As a result of the standardization effort, customers are asking SoC makers to support all of the hardware and software hooks proposed for CPU/GPU coordination, even when they may be a step in the wrong direction on efficiency. HSA, for example, requires full cache coherency between CPUs and offload engines, unified virtual memory management, and (eventually) 64-bit flat addressing throughout. That’s not necessarily optimal for low-cost, low-power offload. There is a legitimate argument that these things would ease function migration onto offload engines, but the lean, mean hardware leverage is significantly reduced, which could be a problem for ultra-small devices used for “Internet of Things” applications. Many of these programming models and offload architectures implicitly or explicitly demand heavy-duty floating point. That’s fine if the applications really need it. But it’s a shame if the applications can really be implemented in fixed point, because there’s a factor of at least three in throughput/watt if you can get a software function down from 32-bit floating point to 16-bit fixed point representation.

The bottom line is that there is no guarantee at all for an SoC maker that the proper interfaces and layers exist in Android, Linux, or Windows 8 Mobile to easily integrate hardware into those systems and allow application software and the overall system to gain full benefits from the hardware. It’s no wonder then that the major SoC suppliers have large software teams re-engineering the guts of these major OSs to support the advanced hardware capabilities they’ve placed on their SoCs.

But when looking at the overall software headcount, it’s also important to recognize that not all of the software developers are working on the core operating system. There is plenty of customer-specific development going on as well. Just as the SoC maker tries to differentiate his SoC with some snazzy hardware (leading to the situation of exploding software developer headcount discussed here), the SoC customer in turn needs to differentiate his product. That differentiation is likely to be done with software, and it’s often part of the SoC maker’s business deal that the SoC maker does a lot of that work. For example, it is not uncommon for a large SoC project to have a significant part of the hundreds of software developers devoted to helping (usually for free to the big customers) customize and optimize the OS to their device.

What can be done to improve the situation? First, maybe nothing at all. As Fred Brooks noted in his now legendary book, “The Mythical Man-Month,” sometimes what’s left for the software is the unique part of the system, the “Essential Complexity” he calls it, and there’s no way around the work required to implement it. But Brooks was no pessimist, so we’ll follow his lead and look at some suggestions to ease the burden even under the current constraints of the market and industry today.

First, there may be some process improvements that can help. For example, here’s an idealized development flow that a number of software architects I’ve worked with have either implemented or wished that they had. The first step in any all-new SoC development is to capture the high-level requirements for the SoC by a team staffed by both hardware and software architects. (It’s not clear that this is always common practice in the industry, by the way). The end result should be a functional specification composed of all of the individual hardware intellectual property (IP) blocks in the SoC. This should include the register definitions of each IP block, which are a key interface for building the software stack. The software architects now have enough data to validate that the software requirements could, at least in theory, be met by the underlying hardware definition. In turn, the architects need to validate that the design could meet the “speeds and feeds” required. This process can conclude with the decision that the SoC “looks good on paper” and the development effort is now moved on to the next phase of implementation.

What’s critical here is two-fold: One is that the magnitude of the gap between the SoC hardware and the target operating system(s) should now be identified, whether large or small. Maybe it really is “a small matter of a driver or two,” or, worst case, a complete re-write of some major subsystem, but at least there should be no illusions as to the effort required (even though, being software, the effort is still likely to be underestimated). The other activity is that the core OS team now has enough information to design and implement an abstraction layer, a generic interface to the underlying SoC acceleration and other specialized SoC hardware subsystems. The main OS team then develops in parallel the middleware pieces and applications that use those capabilities.

Another observation borne from experience, despite wishing the situation were otherwise, is that it’s important not to oversimplify. Hardware/software interactions can be very complex and even the smallest hardware interface or a change to that interface can have ripple effects all the way up the software stack, including the application layer. These ripple effects can occur in many forms:

• Porting existing software to an SoC might require a major re-write of the software to support a new hardware capability.

• Adding new hardware to an existing SoC might disrupt the software stack, making the hardware change too expensive to add; or, shipping an SoC with unused hardware can take up space and consume power.

• Designing a new software stack without regard to the possibility of utilizing hardware offload capability in the future might preclude the software from supporting the next hot SoC.

To shamelessly quote Brooks once again, “There is no silver bullet” when it comes to software development. Indeed, as long as the software is built at arm’s length from the hardware development (and vice versa, of course) and both sides are aggressively innovative, software will bear the burden of making sure the two pieces fit and work together. One could argue this is the cost of innovation and the cost of horizontally structured industry.

Cadence Design Systems
San Jose, CA
(408) 943-1234