Friday, November 17, 2017

Highly Autonomous Vehicle Validation

Here are the slides from my TechAD talk today.


Highly Autonomous Vehicle Validation from Philip Koopman

Highly Autonomous Vehicle Validation: it's more than just road testing!
- Why a billion miles of testing might not be enough to ensure self-driving car safety.
- Why it's important to distinguish testing for requirements validation vs. testing for implementation validation.
- Why machine learning is the hard part of mapping autonomy validation to ISO 26262

Monday, October 9, 2017

Top Five Embedded Software Management Misconceptions

Here are five common management-level misconceptions I run into when I do design reviews of embedded systems. How many of these have you seen recently?

(1) Getting to compiled code quickly indicates progress. (FALSE!)

Many projects are judged by "coding completed" to indicate progress.  Once the code has been written, compiles, and kind of runs for a few minutes without crashing, management figures that they are 90% there.  In reality, a variant of the 90/90 rule holds:  the first 90% of the project is in coding, and the second 90% is in debugging.

Measuring teams on code completion pressures them to skip design and peer reviews, ending up with buggy code. Take the time to do it right up front, and you'll more than make up for those "delays" with fewer problems later in the development cycle.  Rather than measure "code completed" do something more useful, like measure the fraction of modules with "peer review completed" (and defects found in peer review corrected).  There are many reasonable ways to manage, but waterfall-ish projects that treat "code completed" as the most critical milestone is not one of them.

(2) Smart developers can write production-quality code on a long weekend (FALSE!)

Alternate form: marketing sets both requirements and end date without engineering getting a chance to spend enough time on a preliminary design to figure out if it can actually be done.

The true bit is anyone can slap together some code that doesn't work.  Some folks can slap together code in a long weekend that almost works.  But even the best of us can only push so many lines of code in a short amount of time without making mistakes, much less producing something anyone else can understand.  Many of us remember putting together hundreds or thousands of lines on an all-nighter when we were students. That should not be mistaken for writing production embedded code.

Good embedded code tends to cost about an hour for every 1 or 2 lines of non-comment code all-in, including testing (on a really good day 3 lines/hr).  Some teams come from the Lake Wobegone school, where all the programmers are above average.  (Is that really true for your team?  Really?  Good for you!  But you still have to pay attention to the other four items on this list.)  And sure, you can game this metric if you try. Nonetheless, it is remarkable how often I see a number well above about 2 SLOC/hour of deeply embedded code corresponding to a project that is in trouble.

Regardless of the precise productivity number, if you want your system to really work, you need to treat software development as a core competency.  You need an appropriately methodical and rigorous engineering process. Slapping together code quickly gives the illusion of progress, but it doesn't produce reliable products for full-scale production.

(3) A “mostly working,” undisciplined prototype can be deployed.  (FALSE!)

Quick and dirty prototypes provide value by giving stakeholders an idea of what to expect and allowing iterations to converge on the right product. They are invaluable for solidifying nebulous requirements. However, such a prototype should not be mistaken for an actual product!   If you've hacked together a prototype, in my experience it's always more expensive to clean up the mess than it is to take a step back and start a project from scratch or a stable production code base.

What the prototype gives you is a solid sense of requirements and some insight into pitfalls in design.

A well executed incremental deployment strategy can be a compromise to iteratively add functionality if you don't know all your requirements up front. But an well-run Agile project is not what I'm talking about when I say "undisciplined prototype." A cool proof of concept can be very valuable.  It should not be mistaken for production code.

(4) Testing improves software quality (FALSE!)

If there are code quality problems (possibly caused by trying to bring an undisciplined prototype to market), the usual hammer that is brought to bear is more testing.  Nobody ever solved code quality problems by testing. All that testing does is make buggy code a little less buggy. If you've got spaghetti code that is full of bugs, testing can't possibly fix that. And testing will generally miss most subtle timing bugs and non-obvious edge cases.

If you're seeing lots of bugs in system test, your best bet is to use testing to find bug farms. The 90/10 rule applies: many times 90% of the bugs are in bug farms -- the worst 10% of the modules. That's only an approximate ratio, but regardless of the exact number, if you're seeing a lot of system test failures then there is a good chance some modules are especially bug-prone.  Generally the problem is not simply programming errors, but rather poor design of these bug-prone modules that makes bugs inevitable. When you identify a bug farm, throw the offending module away, redesign it clean, and write the code from scratch. It's tempting to think that each bug is the last one, but after you've found more than a handful of bugs in a module, who are you kidding? Especially if it's spaghetti code, bug farms will always be one bug away from being done, and you'll never get out of system test cleanly.

(5) Peer review is too expensive (FALSE!)

Many, many projects skip peer review to get to completed code (see item #1 above). They feel that they just don't have time to do peer reviews. However, good peer reviews are going to find 50-75% of your bugs before you ever get to testing, and do so for about 10% of your development budget.  How can you not afford peer reviews?   (Answer: you don't have time to do peer reviews because you're too busy writing bugs!)

Have you run into another management misconception on a par with these? Let me know what you think!

Friday, September 22, 2017

Monday, August 28, 2017

The Spaghetti Factor -- A Software Complexity Metric Proposal


I've had to review code that has spaghetti-level complexity in control flow (too high cyclomatic complexity).  And I've had to review code that has spaghetti-level complexity its data flow (too many global variables mixed together into a single computation).  And I've had to review procedures that just go on for page after page with no end in sight. But the stuff that will really make your brain hurt is code that has all of these problems.

There are many complexity metrics out there. But I haven't seen a one that directly tries to help balance three key points of complexity into a single intuitive number: code complexity, data complexity, and module size. So here is a proposal that could help drive improvement in a lot of the terrible embedded control code I've seen:



The Spaghetti Factor metric (SF):

SF = SCC + (Globals*5) + (SLOC/20)

SCC = Strict Cyclomatic Complexity
Globals = # of read/write global variables
SLOC = # source lines of non-comment code (e.g., C statements)

Scoring:
5-10 - This is the sweet spot for most code except simple helper functions
15 - Don't go above this for most modules
20 - Look closely; review to see if refactoring makes sense
30 - Refactor the design
50 - Untestable; throw the module away and fix the design
75 - Unmaintainable; throw the module away; throw the design away; start over
100 - Nightmare; probably you need to throw the whole subsystem away and re-architect it



Notation:

SCC is Strict Cyclomatic Complexity (sometimes called CC2).  This is a variant of McCabe Cyclomatic complexity (MCC). In general terms, MCC is based on the number of branches in the program. SCC additionally considers complexity based on the number of conditions within each branch. SCC is an approximation of how many test cases it takes to exercise all the paths through code including all the different ways there are to trigger each branch. In other words, it is an estimate of how much work it is to do unit testing. Think of it as an approximation to the effort required for MC/DC testing. But in practice it is also a measure of how hard it is to understand the code.  The idea is to keep SCC low enough that it is feasible to understand and test paths through the code.

Globals is the number of read/write global variables accessed by the module. This does not include "const" values, nor file static variables.  In an ideal world you have zero or near-zero global variables. If you have inherent global state, you should encapsulated that in a state object with appropriate access functions to enforce well-disciplined writes.  Referencing an unstructured pile of dozens or hundreds of global variables can make software difficult to test, and can make subsystem testing almost impossible. Partly that is due to the test scaffolding required, but partly that is simply due to the effort of chasing down all the globals and trying to figure out what they do both inbound and outbound. Moreover, too many globals can make it nearly impossible to chase down bugs or understand the effects of changing one part of the code on the rest of the code. An important goal of this part of the metric is to discourage use of many disjoint global variables to implicitly pass data around from routine to routine instead of passing parameters along with function calls.

SLOC is the number of non-comment "Source Lines of Code."  For C programs, this is the number of programming statements. Typical guidelines are a maximum 100-225 maximum lines of code for a module, with most modules being smaller than that.

As an example calculation, if you have 100 lines of code with an SCC of 9 and 1 global reference, your score will be  SF = 9 + (1*5) + (100/20) = 19.  A score of 19 is on the upper edge of being OK. If you have a distribution of complexity across modules, you'd want most of them to be a bit lower in complexity than this example calculation.

Discussion:

The guideline values are taken primarily from MCC, which typically has a guideline of 10 for most modules, 15 as a usual bound, and 30 as limit.  To account for globals and length, based on my experience, I've changed that to 15 for most modules, 20 as a soft limit and 30 as a hard limit.  You might wish to adjust the threshold and multipliers based on your system and experience. In particular it is easy to make a case that these limits aren't strict enough for life-critical software, and a case can be made for being a little more relaxed in throw-away GUI management code.  But I think this is a good starting point for most every-day embedded software that is written by a human (as opposed to auto-generated code).

The biggest exception is usually what to do about switch statements.  If you exempt them you can end up with multiple switches in one module, or multiple switch/if/switch/if layered nesting.  (Neither is a pretty sight.) I think it is justifiable to exempt modules that have ONLY a switch and conditional logic to do sanity checking on the switch value.  But, because 30 is a pretty generous limit, you're only going to see this rarely. Generally the only legitimate reason to have a switch bigger than that is for something like processing a message type for a communication protocol.  So I think you should not blanket exempt switch statements, but rather include them in an overall case-by-case sign-off by engineering management as to which few exceptions are justifiable.

Some might make the observation that this metric discourages extensive error checking.  That's a different topic, and certainly the intent is NOT to discourage error checking. But the simple answer is that error checking has to be tested and understood, so you can't simply ignore that part of the complexity. One way to handle that situation is to put error checking into a subroutine or wrapper function to get that complexity out of the way, then have that wrapper call the actual function that does the work.  Another way is to break your overall code down into smaller pieces so that each piece is simple enough for you to understand and test both the functionality and the error checking.

Finally, any metric can be gamed, and that is surely true of simple metrics like this one.  A good metric score doesn't necessarily mean your code is fantastic. Additionally, this metric does not consider everything that's important, such as the total number of globals across your code base. On the other hand, if you score poorly on this metric, most likely your code is in need of improvement.

What I recommend is that you use this metric as a way to identify code that is needlessly complex.  It is the rare piece of code indeed that unavoidably needs to have a high score on this complexity metric. And if all your code has a good score, that means it should be that much easier to do peer review and unit testing to ensure that other aspects of the code are in good shape.

References:

A NIST paper on applying metrics is here: http://www.mccabe.com/pdf/mccabe-nist235r.pdf including an interesting discussion of the pitfalls of handling switch statements within a complexity framework.

Monday, July 24, 2017

Don't use macros for MIN and MAX



It is common to see small helper functions implemented as macros, especially in older C code. Everyone seems to do it.  But you should avoid macros, and instead use inline functions.

The motivation for using macros was originally that you needed to use a small function in many places but were worried about the overhead of doing a subroutine call. So instead, you used a macro, which expands into source code in the preprocessor phase.  That was a reasonable tradeoff 40 years ago. Not such a great idea now, because macros cause problems for no good reason.

For example, you might look on the Web and find these common macros
    #define MAX(a,b) ((a) > (b) ? a : b)
    #define MIN(a,b) ((a) < (b) ? a : b)

And you might find that it seems to work for a while.  You might get bitten by the missing "()" guards around the second copy of a and b in the above -- which version you get depends on which cut & paste code site you visit. 

But then you'll find that there are still weird situations where you get unexpected behavior. For example, what does this do?
    c = MAX(a++, b);
If a is greater than b executing the code will increment a twice, but if a is less than or equal to b it will only increment a once.  And if you start mixing types or putting complicated expressions into the macro things can get weird and buggy in a hurry.

Another related problem is that the macro will expand, increasing the cyclomatic complexity of your code. That's because a macro is equivalent to you having put the conditional branch into the source code. (Remember, macro expansion is done by the preprocessor, the so compiler itself acts as if you'd typed the conditional assignment expression every place you use the macro.) This complexity rating is justified, because there is no actual procedure that can be unit tested independently.

As it turns out, macros are evil. See the C++ FAQ: https://isocpp.org/wiki/faq/misc-technical-issues#macros-with-if  which lists 4 different types of evil behavior.  There are fancy hacks to try to get any particular macros such as MIN and MAX to be better behaved, but no matter how hard you try you're really just making a deal with the devil. 

What's the fix?

The fix is: don't use macros. Instead use inline procedure calls.

You should already have access to built-in functions for floating point such as fmin() and fmax().  If it's there, use the stuff from your compiler vendor instead of writing it yourself!

If your compiler doesn't have integer min and max, or you are worried about breaking existing macro code, convert the macros into inline functions with minimal changes to your code base:

inline int32_t MAX(int32_t a, int32_t b) { return((a) > (b) ? a : b); }
inline int32_t MIN(int32_t a, int32_t b) { return((a) < (b) ? a : b); }

If you have other types to deal with you might need different variants depending on the types, but often a piece of code uses predominantly one data type for its calculations, so in practice this is usually not a big deal. And don't forget, if your build environment has a built in min or max you can just set up the macro to call that directly.

What about performance?

The motivation for using macros back in the bad old days was efficiency. A subroutine call involved a lot of overhead. But the inline keyword tells the compiler to expand the code in-place while retaining all the advantages of a subroutine call.  Compilers are pretty good at optimization these days. So there is no overhead at run-time.  I've also seen advice to put the inline function in a header file so it will be visible to any procedure needing it, and the macro was already there anyway.

Strictly speaking, "inline" is a suggestion to the compiler. However, if you have a decent compiler it will follow the suggestion unless the inline function is so big the call overhead just doesn't matter. Some compilers have a warning flag that will let you know when the inline didn't happen.  For example, use -Winline for gcc.  If your compiler ignores "inline" for something as straightforward as MIN or MAX, get a different compiler.

What about multiple types?

A perceived advantage of the macro approach is that you can play fast and loose with types.  But playing fast and loose with types is a BAD IDEA because you'll get bugs.  

If you really hate having to match the function name to the data types then what you need is to switch to a language that can handle this by automatically picking the right function based on the operator types. In other words, switch from a to a language that is 45 years old (C) to one that is only about 35 years old (C++).  There's a paper from 1995 that explains this in the context of min and max implemented with templates:  http://www.aristeia.com/Papers/C++ReportColumns/jan95.pdf
As it turns out the rabbit hole goes a lot deeper than you might think for a generic solution.

But you don't have to go down the rabbit hole.  For most code the best answer is simply to use inline functions and pick the function name that matches your data types. You shouldn't lose any performance at all, and you'll be likely to save a lot of time chasing obscure bugs.