Wednesday, August 24, 2011

Bugs and Numbers: How many bugs do you have in your code?

If you follow Zero Bug Tolerance of course you’re not supposed to have any bugs to fix after the code is done. But let’s get real. Is there any way to know how many bugs you're missing and will have to fix later, and how many bugs you might already have in your code? Are there any industry measures of code quality that you can use as a starting point?

For questions like these, the first place I look is one of the books by Capers Jones: arguably the leading expert in all things to do with software development metrics. There’s Applied Software Measurement, or Estimating Software Costs, or Software Engineering Best Practices: Lessons from Successful Projects in the Top Companies. These books offer different views into a fascinating set of data that Capers Jones has collected over decades from thousands of different projects. It’s easy to spend hours getting lost in this data set, and reading through and questioning the findings that he draws from it.

So what can we learn from Capers Jones about bugs and defect potentials and defect density rates? A lot, actually.

On average 85% of bugs introduced in design and development are caught before the code is released (this is the average in the US as of 2009). His research shows that this defect removal rate has stayed roughly the same over 20 years, which is disappointing given the advances in tools and methods over that time.

We introduce 5 bugs per Function Point (Capers Jones is awfully fond of measuring everything by Function Points, which are an abstract way of measuring code size), depending on the type of system being built. Web systems are a bit lower surprisingly, at 4 bugs per Function Point; other internal business systems are 5, military systems average around 7. Using backfiring (a crude technique to convert function points back into LOC measures) you can equivalence 1 Function Point to about 50-55 lines of Java code.

For the sake of simplicity, let’s use 1 Function Point = 50 LOC, and keep in mind that all of these numbers are really rough, and that using backfiring techniques to translate Function Points to source code statements introduces a probability of error, but it’s a lot easier than trying to think in Function Points. And all I want here is a rough indicator of how much trouble a team might be in.

If 85% of bugs are hopefully found and fixed before the code is released, this leaves 0.75 bugs per Function Point unfound (and obviously unfixed) in the code when it gets to production. Which means that for a small application of 1,000 Function Points (50,000 or so lines of Java code), you could expect around 750 defects at release .

And this is only accounting for the bugs that you don’t already know about: a lot of code is released with a list of known bugs that the development team hasn’t had a chance to fix, or doesn’t think is worth fixing, or doesn’t know how to fix. And, this is just your code: it doesn’t account for bugs in the technology stack that the application depends on: the frameworks and application platform, database and messaging middleware, and any open source libraries or COTS that you take advantage of.

Of these 750+ bugs around 25% will be severity 1 show stoppers – real production problems that cause something significant to break.

Ouch – no wonder most teams will spend a lot of time on support and fixing bugs after releasing a big system. Of course, if you’re building and releasing software incrementally, you’ll find and fix more of these bugs as you go along, but you’ll still be fixing a lot of bugs in production.

Remember that these are rough averages. And remember (especially the other guys out there), we can’t all be above average, no matter how much we would like to be. For risk management purposes, it might be best to stick with averages, or even consider yourself below the bar.

Also keep in mind that defect potentials increase with the size of the system – big apps have more bugs on average. Not only is there a higher potential to write buggy code in bigger systems, but as the code base gets bigger and more complex it’s also harder to find and fix bugs. So big systems get released with even more bugs, and really big apps with a lot more bugs.

All of this gets worse in maintenance

In maintenance, the average defect potential for making changes is higher than in development, about 6 bugs per Function Point instead of 5. And the chance of finding and fixing mistakes in your changes is lower (83%). This is all because it’s harder to work with legacy code that you didn’t write and don’t understand all that well. So you should expect to release 1.08 bugs per Function Point when changing code in maintenance, instead of 0.75 bugs per Function Point.

And maintenance teams still have to deal with the latent bugs in the system, some of which may hide in the code for years, or forever. This includes heisenbugs and ghosts and weird timing issues and concurrency problems that disappear when you try to debug them. On average, 50% of residual latent defects are found each calendar year. The more people using your code, the faster that these bugs will be found.

Of course, once you find these bugs, you still have to fix them. The average maintenance programmer can be expected to fix around 10 bugs per month – and maybe implement some small enhancements too. That's not a great return on investment.

Then there’s the problem of bug re-injections, or regressions – when a programmer breaks something accidentally as a side-effect of making a fix. On average, programmers fixing a bug will introduce a new bug 7% of the time – and this can run as high as 20% for complex, poorly-structured code. Trying to fix these bad fixes is even worse – programmers trying to fix these mistakes have a 15% chance of still messing up the fix, and a 30% chance of introducing yet another bug as a side effect! It's better to roll-back the fix and start again.

Unfortunately, all of this gets worse over time. Unless you are doing a perfect job of refactoring and continuously simplifying the code, you can expect code complexity to increase an average of between 1% and 3% per year. And most systems get bigger over time, as you add more features and copy-and-paste code (of course you don't do that): the code base for a system under maintenance increases between 5-10% per year. As the code gets bigger and more complex, the chance for more bugs also increases each year.

But what if we’re not average? What if we’re best in class?

What if you are doing an almost perfect job, if you are truly best in class? Capers Jones finds that best in class teams create half as many bugs as average teams (2.5 or fewer defects per Function Point instead of 5), and they find and fix 95% or more of these bugs before the code is released. That sounds impressive - it means only 0.125 bugs per Function Point. But for a 50,000 LOC system, that’s still somewhere around 125 bugs on delivery.

And as for zero bugs? In his analysis of 13,000 projects over a period of more than 40 years, there were 2 projects with no defects reported within a year of release. So you can aspire to it. But don’t depend on it.

Monday, August 15, 2011

The C14N challenge

Failing to properly validate input data is behind at least half of all application security problems. In order to properly validate input data, you have to start by first ensuring that all data is in the same standard, simple, consistent format – a canonical form. This is because of all the wonderful flexibility in internationalization and data formatting and encoding that modern platforms and especially the Web offer. Wonderful capabilities that attackers can take advantage of to hide malicious code inside data in all sorts of sneaky ways.

Canonicalization is a conceptually simple idea: take data inputs, and convert all of it into a single, simple, consistent normalized internal format before you do anything else with it. But how exactly do you do this, and how do you know that it has been done properly? What are the steps that programmers need to take to properly canonicalize data? And how do you test for it? This is where things get fuzzy as hell.

To read my latest post on canonicalization problems (and the search for solutions), go to the SANS Application Security blog.

Site Meter