Tuesday, January 20, 2015

ThoughtWorks Takes Security Sandwiches off the Menu

Most people in software development have heard about ThoughtWorks.

ThoughtWorks' Chief Scientist, Martin Fowler, is one of the original Agile thought leaders, and they continue to drive new ideas in Agile development and devops, including Continuous Delivery.

At least once a year the thought leaders of ThoughtWorks get together and publish a Technology Radar – a map of the techniques and tools and ideas that they are having success with and recommend to other developers, or that are trying out in their projects and think other people should know more about, or that they have seen fail and want to warn other people about.

I always look forward to reading the Radar when it comes out. It’s a good way to learn about cool tools and new ideas, especially in devops, web and mobile development, Cloudy stuff and IoT, and other things that developers should know about.

But until recently, security has been conspicuously absent from the Radar: which means that security wasn't something that ThoughtWorks developers thought was important or interesting enough to share. Over the last year this has changed, and ThoughtWorks has started to include application security and data privacy concerns in design, development and delivery, including privacy vs big data, forward secrecy, two-factor authentication, OpenID Connect, and the OWASP Top 10.

The first Radar of 2015 recommends that organizations avoid the “Security Sandwich” approach to implementing appsec in development projects, and instead look for ways to build security into Agile development:

Traditional approaches to security have relied on up-front specification followed by validation at the end. This “Security Sandwich” approach is hard to integrate into Agile teams, since much of the design happens throughout the process, and it does not leverage the automation opportunities provided by continuous delivery. Organizations should look at how they can inject security practices throughout the agile development cycle.

This includes: evaluating the right level of Threat Modeling to do up-front; when to classify security concerns as their own stories, acceptance criteria, or cross-cutting non-functional requirements; including automatic static and dynamic security testing into your build pipeline; and how to include deeper testing, such as penetration testing, into releases in a continuous delivery model. In much the same way that DevOps has recast how historically adversarial groups can work together, the same is happening for security and development professionals.

The sandwich – policies upfront, and pen testing at the end to “catch all the security bugs” – doesn't work, especially for Agile teams and teams working in devops environments. Teams who use lightweight, iterative incremental development practices and release working software often need tools and practices to match. Instead of scan-at-the-end-then-try-to-fix, we need simple, efficient checks and guides that can be embedded into Agile development and faster, more efficient tools that provide immediate feedback in Continuous Integration and Continuous Delivery. And we need development and security working together more closely and more often.

It’s good to see pragmatic application security on the ThoughtWorks Radar. I hope it’s on your radar too.

Tuesday, January 13, 2015

We can’t measure Programmer Productivity… or can we?

If you go to Google and search for "measuring software developer productivity" you will find a whole lot of nothing. Seriously -- nothing.
Nick Hodges, Measuring Developer Productivity

By now we should all know that we don’t know how to measure programmer productivity.

There is no clear cut way to measure which programmers are doing a better or faster job, or to compare productivity across teams. We “know” who the stars on a team are, who we can depend on to deliver, and who is struggling. And we know if a team is kicking ass – or dragging their asses. But how do we prove it? How can we quantify it?

All sorts of stupid and evil things can happen when you try to measure programmer productivity.

But let’s do it anyways.

We’re writing more code, so we must be more productive

Developers are paid to write code. So why not measure how much code they write – how many lines of code get delivered?

Because we've known since the 1980s that this is a lousy way to measure productivity.

Lines of code can’t be compared across languages (of course), or even between programmers using the same language working in different frameworks or following different styles. Which is why Function Points were invented – an attempt to standardize and compare the size of work in different environments. Sounds good, but Function Points haven’t made it into the mainstream, and probably never will – very few people know how Function Points work, how to calculate them and how they should be used.

The more fundamental problem is that measuring productivity by lines (or Function Points or other derivatives) typed doesn’t make any sense. A lot of important work in software development, the most important work, involves thinking and learning – not typing.

The best programmers spend a lot of time understanding and solving hard problems, or helping other people understand and solve hard problems, instead of typing. They find ways to simplify code and eliminate duplication. And a lot of the code that they do write won’t count anyways, as they iterate through experiments and build prototypes and throw all of it away in order to get to an optimal solution.

The flaws in these measures are obvious if we consider the ideal outcomes: the fewest lines of code possible in order to solve a problem, and the creation of simplified, common processes and customer interactions that reduce complexity in IT systems. Our most productive people are those that find ingenious ways to avoid writing any code at all.
Jez Humble, The Lean Enterprise

This is clearly one of those cases where size doesn’t matter.

We’re making (or saving) more money, so we must be working better

We could try to measure productivity at a high level using profitability or financial return on what each team is delivering, or some other business measure such as how many customers are using the system – if developers are making more money for the business (or saving more money), they must be doing something right.

Using financial measures seems like a good idea at the executive level, especially now that “every company is a software company”. These are organizational measures that developers should share in. But they are not effective – or fair – measures of developer productivity. There are too many business factors are outside of the development team’s control. Some products or services succeed even if the people delivering them are doing a lousy job, or fail even if the team did a great job. Focusing on cost savings in particular leads many managers to cut people and try “to do more with less” instead of investing in real productivity improvements.

And as Martin Fowler points out there is a time lag, especially in large organizations – it can sometimes take months or years to see real financial results from an IT project, or from productivity improvements.

We need to look somewhere else to find meaningful productivity metrics.

We’re going faster, so we must be getting more productive

Measuring speed of development – velocity in Agile – looks like another way to measure productivity at the team level. After all, the point of software development is to deliver working software. The faster that a team delivers, the better.

But velocity (how much work, measured in story points or feature points or ideal days, that the team delivers in a period of time) is really a measure of predictability, not productivity. Velocity is intended to be used by a team to measure how much work they can take on, to calibrate their estimates and plan their work forward.

Once a team’s velocity has stabilized, you can measure changes in velocity within the team as a relative measure of productivity. If the team’s velocity is decelerating, it could be an indicator of problems in the team or the project or the system. Or you can use velocity to measure the impact of process improvements, to see if training or new tools or new practices actually make the team’s work measurably faster.

But you will have to account for changes in the team, as people join or leave. And you will have to remember that velocity is a measure that only makes sense within a team – that you can’t compare velocity between teams.

Although this doesn't stop people from trying. Some shops use the idea of a well-known reference story that all teams in a program understand and use to base their story points estimates on. As long as teams aren't given much freedom on how they come up with estimates, and as long as the teams are working in the same project or program with the same constraints and assumptions, you might be able to do rough comparison of velocity between teams. But Mike Cohn warns that

If teams feel the slightest indication that velocities will be compared between teams there will be gradual but consistent “point inflation.”

ThoughtWorks explains that velocity <> productivity in their latest Technology Radar:

We continue to see teams and organizations equating velocity with productivity. When properly used, velocity allows the incorporation of “yesterday's weather” into a team’s internal iteration planning process. The key here is that velocity is an internal measure for a team, it is just a capacity estimate for that given team at that given time. Organizations and managers who equate internal velocity with external productivity start to set targets for velocity, forgetting that what actually matters is working software in production. Treating velocity as productivity leads to unproductive team behaviors that optimize this metric at the expense of actual working software.

Just stay busy

One manager I know says that instead of trying to measure productivity

“We just stay busy. If we’re busy working away like maniacs, we can look out for problems and bottlenecks and fix them and keep going”.

In this case you would measure – and optimize for – cycle time, like in Lean manufacturing.

Cycle time – turnaround time or change lead time, from when the business asks for something to when they get it in their hands and see it working – is something that the business cares about, and something that everyone can see and measure. And once you start looking closely, waste and delays will show up as you measure waiting/idle time, value-add vs. non-value-add work, and process cycle efficiency (total value-add time / total cycle time).

“It’s not important to define productivity, or to measure it. It’s much more important to identify non-productive activities and drive them down to zero.”
Erik Simmons, Intel

Teams can use Kanban to monitor – and limit – work in progress and identify delays and bottlenecks. And Value Stream Mapping to understand the steps, queues, delays and information flows which need to be optimized. To be effective, you have to look at the end-to-end process from when requests are first made to when they are delivered and running, and optimize all along the path, not just the work in development. This may mean changing how the business prioritizes, how decisions are made and who makes the decisions.

In almost every case we have seen, making one process block more efficient will have a minimal effect on the overall value stream. Since rework and wait times are some of the biggest contributors to overall delivery time, adopting “agile” processes within a single function (such as development) generally has little impact on the overall value stream, and hence on customer outcomes.
Jezz Humble, The Lean Enterprise

The down side of equating delivery speed with productivity? Optimizing for cycle time/speed of delivery by itself could lead to problems over the long term, because this incents people to think short term, and to cut corners and take on technical debt.

We’re writing better software, so we must be more productive

“The paradox is that when managers focus on productivity, long-term improvements are rarely made. On the other hand, when managers focus on quality, productivity improves continuously.”
John Seddon, quoted in The Lean Enterprise

We know that fixing bugs later costs more. Whether it’s 10x or 100+x, it doesn't really matter. And that projects with fewer bugs are delivered faster – at least up to a point of diminishing returns for safety-critical and life-critical systems.

And we know that the costs of bugs and mistakes in software to the business can be significant. Not just development rework costs and maintenance and support costs. But direct costs to the business. Downtime. Security breaches. Lost IP. Lost customers. Fines. Lawsuits. Business failure.

It’s easy to measure that you are writing good – or bad – software. Defect density. Defect escape rates (especially defects – including security vulnerabilities – that escape to production). Static analysis metrics on the code base, using tools like SonarQube.

And we know how to write good software - or we should know by now. But is software quality enough to define productivity?

Devops – Measuring and Improving IT Performance

Devops teams who build/maintain and operate/support systems extend productivity from dev into ops. They measure productivity across two dimensions that we have already looked at: speed of delivery, and quality.

But devops isn't limited to just building and delivering code – instead it looks at performance metrics for end-to-end IT service delivery:

  1. Delivery Throughput: deployment frequency and lead time, maximizing the flow of work into production
  2. Service Quality: change failure rate and MTTR

It’s not a matter of just delivering software faster or better. It’s dev and ops working together to deliver services better and faster, striking a balance between moving too fast or trying to do too much at a time, and excessive bureaucracy and over-caution resulting in waste and delays. Dev and ops need to share responsibility and accountability for the outcome, and for measuring and improving productivity and quality.

As I pointed out in an earlier post this makes operational metrics more important than developer metrics. According to recent studies, success in achieving these goals lead to improvements in business success: not just productivity, but market share and profitability.

Measure Outcomes, not Output

In The Lean Enterprise (which you can tell I just finished reading), Jez Jumble talks about the importance of measuring productivity by outcome – measuring things that matter to the organization – not output.

“It doesn't matter how many stories we complete if we don’t achieve the business outcomes we set out to achieve in the form of program-level target conditions”.

Stop trying to measure individual developer productivity.

It’s a waste of time.

Everyone knows who the top performers are. Point them in the right direction, and keep them happy.

Everyone knows the people who are struggling. Get them the help that they need to succeed.

Everyone knows who doesn't fit in. Move them out.

Measuring and improving productivity at the team or (better) organization level will give you much more meaningful returns.

When it comes to productivity:

  1. Measure things that matter – things that will make a difference to the team or to the organization. Measures that are clear, important, and that aren't easy to game.
  2. Use metrics for good, not for evil – to drive learning and improvement, not to compare output between teams or to rank people.

I can see why measuring productivity is so seductive. If we could do it we could assess software much more easily and objectively than we can now. But false measures only make things worse.
Martin Fowler, CannotMeasureProductivity

Wednesday, December 10, 2014

If you could only do one thing to make better software, what would it be?

Good technical practices are what we have to do to make good software – this is the engineering part of software engineering. Design. Coding. Testing and Reviews.

If you could do only one thing to make better software, what would it be? Where would you get the most bang for your buck?

Continuous Integration – Making Code Run

Continuous Integration is an obvious place to start. You need to build the software and get it running before you can do anything useful with it.

Getting developers to check in and sync up with each other more often. Building the system more often – at least once a day to start, then on every check in. Which means simplifying and automating the steps to build the system. Making sure that the system builds successfully every time – without errors or warnings. Which means that people can run it and try it out whenever they want. Make sure that it will run correctly. Which means adding tests and checks as part of the build and deploy steps. Building information radiators so that everyone knows the status of the build and when the build is broken.

You can’t be Agile without Continuous Integration, and you need Continuous Integration in place before you can go down the Devops path to Continuous Delivery or Continuous Deployment.

And Continuous Integration works in sequential Waterfall delivery too. Developers in these environments might check in more code less often, but there is still real value in knowing that you can build and run the system and see it working sooner rather than later, especially in big enterprise systems and big programs where getting dependencies worked out and all the pieces working together is a huge challenge.

Developers testing their own work – Making Code Work

Making developers responsible for testing their own work, automating this as much as possible by building on Continuous Integration, is the only way to deliver software faster and keep costs down – depending too much on manual testing and hand-offs to a test team will slow you down too much.

Almost every organization that I have talked to over the past couple of years is pushing more responsibilities for testing onto developers, and pushing more testers into development teams (or out of the organization altogether), following the lead of Google and now Microsoft, to become "more Agile".

This means relying more on developers to write good automated tests (unit tests, basic UI regression using Selenium or Watir) and static analysis checking in Continuous Integration or the developer’s IDE to find common coding mistakes and security vulnerabilities.

But there are limits to what developers will catch in their testing, even good developers. Once you get developers to write tests (before, or after they write the code, it doesn't matter, now that TDD is dead), you’ll end up with mostly simple unit tests or UI regression tests that don’t stray far from the happy path, proving that the code does what the developer thinks it is supposed to do – because that is what they need to get their work done. Their assumptions and blind spots will be reflected in the tests as well as the code. Little or no negative testing. Or usability testing. Or security testing. Or stress testing. Or system-level integration testing. All of which still has to be done by somebody - unless you expect your customers to find your bugs for you.

It will take a long time before the team learns how to write good, efficient tests, and before they build up a set of tests that will catch real bugs, rather than just getting in the way. But if you do this right, you can deliver good code while still moving fast, and get better value out of testing.

Code Reviews or Pairing – Making Code Good

Another way to get better code is by getting developers to do code reviews.

Code reviews should be about finding problems in the code first – checking for correctness, defensive coding protection (error handling and API contracts and thread safety and data validation), security (using security libraries correctly for access control and output encoding, protecting confidential data, logging and auditing…). And about making the code better – more understandable, safer and easier to change.

Code reviews are expensive, so do them right: lightweight, risk-based, using static analysis first to catch low-level mistakes and bad coding practices so that reviewers can spend their time looking for more important problems.

Instead of code reviews, you could try pairing as a way to get another pair of eyes on the code.

Pairing isn’t the same as reviews – the goals and priorities are different. A good reviewer will find problems even in code developed through pair programming, because reviewers look for different things. But research proves that disciplined pair programming will give you better structured, cleaner code, with fewer bugs. And pairing is a much better way to teach programmers about the system than code reviews are.

The downsides of pair programming? The cost of having two people do the work of one person – a good pair will work faster than one person on their own, but the less experienced or less skilled team member will slow the pair down to what they can deal with. Focus fatigue. Pairing can be exhausting, which means people can’t do it for too long at a stretch, before their work becomes superficial or strained. And social problems. People who like it, like it a lot. But people who don’t like it won’t do it at all.

Refactoring – Making Code – and Design – Last

What about design? Collaborative design workshops? Design reviews? Threat modeling in design to take care of security and operational risks?

We do all of these things. But as we continue to iterate through the design and as our code base grows, refactoring – to retain, and sometimes restore, the design, and to keep the code maintainable – is becoming more and more important.

It’s easy to learn your IDE’s refactoring tools and the basic ideas behind refactoring. But it’s not easy to learn how do refactoring right (although you can learn a lot in a short time from Woody Zuill and Llewellyn Falco in their “2 Minutes to Better Code” video). Understanding why some refactoring approaches are better than others. How to save time refactoring. How to do it safely.

Mariusz Sieraczkiewicz does a good job of explaining how and when to do "everyday refactoring" using a matrix built on Michael Feathers’ work on brutal refactoring and the biology of code:

  1. Start by reading and annotating the code, maybe do some scratch (rapid, throwaway) refactoring to understand it better
  2. Find meaningful names for variables and conditionals
  3. Extract methods to break down big chunks of code and express the algorithm
  4. Get rid of obvious duplication
  5. Move methods and extract classes to isolate responsibilities.
I agree with Sieraczkiewicz that these simple steps “would heal most code bases on this planet”. He then goes on to describe larger and more fundamental “strategic refactoring” (aka “root canal refactoring"): refactoring to patterns, introducing new architectural constructs. Work that carries much higher risks and costs. This is where refactoring ends, and re-design and re-architecture starts.

What would you do, to make Better Software?

Continuous Integration can pay off quickly: the change in transparency and in the team’s focus is almost immediate.

Developer testing is a journey, not a goal. It will take a long time for most developers to get good at it, and a long time to build up a good set of tests that you can rely on. The sooner you start, the better.

Code reviews can also take a long time to pay off. Developers – and managers – need to make the time for reviews to be done and build the discipline, and developers need time to learn how to review code properly, and how to give and accept criticism. But code reviews – or pairing – will give you better code.

Refactoring is more of a compounding investment – you pay a little bit today to save a lot in the future.

If there is only one thing that you could do to make better software, what would it be? Where would you start?

Wednesday, November 19, 2014

Different Ways of Scaling Agile

At this year's Construx Software Executive Summit one of the problems that we explored was how to scale software development, especially Agile development, across projects, portfolios, geographies and enterprises. As part of this, we looked at 3 different popular methods for scaling Agile: LeSS (Large Scale Scrum), SAFe (Scaled Agile Framework), and DAD (Disciplined Agile Delivery).

LeSS and LeSS Huge - Large Scale Scrum

Craig Larman, the co-author of LeSS (and LeSS Huge - for really big programs), started off by criticizing the "contract game" or "commitment game" that management, developers and customers traditionally play to shift blame upfront for when things (inevitably) go wrong on a project. It was provocative and entertaining, but it had little to do with scaling Agile.

He spent the rest of his time building the case for restructuring organizations around end-to-end cross-functional feature teams who deliver working code rather than specialist component teams and functional groups or matrices. Feature teams can move faster by sharing code and knowledge, solving problems together and minmizing handoffs and delays.

Enterprise architecture in LeSS seems easy. Every team member is a developer - and every developer is an architect. Architects work together outside of teams and projects in voluntary Communities of Practice to collaborate and shape the organization's architecture together. This sounds good - but architecture, especially in large enterprise environments, is too important to try and manage out-of-band. LeSS doesn't explain how eliminating specialization and working without upfront architecture definition and architectural standards and oversight will help build big systems that work with other big systems.

LeSS is supposed to be about scaling up, but most of what LeSS lays out looks like Scrum done by lots of people at the same time. It's not clear where Scrum ends and LeSS starts.

SAFe - Scaled Agile Framework

There's no place for management in LeSS (except for Product Owners, who are the key constraint for success - like in Scrum). Implementing Less involves fundamentally restructuring your organization around business-driven programs and getting rid of managers and specialists.

Managers (as well as architects and other specialists) do have a role in SAFe's Scaled Agile Framework - a more detailed and heavyweight method that borrows from Lean, Agile and sequential Waterfall development approaches. Teams following Scrum (and some XP technical practices) to build working code roll up into programs and portfolios, which need to be managed and coordinated.

In fact, there is so much for managers to do in SAFe as "Lean-Agile Leaders" that Dean Leffingwell spent most of his time enumerating and elaborating the roles and responsibilities of managers in scaling Agile programs and leading change.

Some of the points that stuck with me:

  • The easiest way to change culture is to have success. Focus on execution, not culture, and change will follow.
  • From Deming: Only managers can change the system - because managers create systems. Change needs to come from the middle.
  • Managers need to find ways to push decisions down to teams and individuals, giving them strong and clear "decision filters" so that they understand how to make their own decisions.

DAD - Disciplined Agile Delivery

Scott Ambler doesn't believe that there is one way to scale Agile development, because in an enterprise different teams and projects will deliver different kinds of software in different ways: some may be following Scrum or XP, or Kanban, or Lean Startup with Continuous Deployment, or RUP, or SAFe, or a sequential Waterfall approach (whether they have good reasons, or not so good reasons, for working the way that they do).

Disciplined Agile Development (DAD) is not a software development method or project management framework - it is a decision-making framework that looks at how to plan, build and run systems across the enterprise. DAD layers over Scrum/XP, Lean/Kanban or other lifeycles, helping managers make decisions about how to manage projects, how to manage risks, and how to drive change.

Projects, and people working in projects, need to be enterprise-aware - they need to work within the constraints of the organization, follow standards, satisfy compliance, integrate with legacy systems and with other projects and programs, and leverage shared resources and expertise and other assets across the organization.

Development isn't the biggest problem in scaling Agile. Changes need to be made in many different parts of the organization in order to move faster: governance (including the PMO), procurement, finance, compliance, legal, product management, data management, ops, ... and these changes can take a long time. In Disciplined Agile Development, this isn't easy, and it's not exciting. It just has to be done.

Scaling Agile is Hard, but it's worth it

Almost all of us agreed with Dean Leffingwell that "nothing beats Agile at the team level". But achieving the same level of success at the organizational level is a hard problem. So hard that none of the people who are supposed to be experts at it could clearly explain how to do it.

After talking to senior managers from many different industries and different countries, I learned that most organizations seem to be finding their own way, blending sequential Waterfall stage-gate development and large-scale program management practices at the enterprise-level with Agile at the team level. Using Agile approaches to explore ideas and requirements, prototyping and technical spikes to help understand viability and scope and technical needs and risks early, before chartering projects. Starting off these projects with planning and enough analysis and modeling upfront to identify key dependencies and integration points, then getting Agile teams to fill in the details and deliver working software in increments. Managing these projects like any other projects, but with more transparency into the real state of software development - because you get working software instead of status reports.

The major advantage of Agile at scale isn't the ability to react to continuous changes or even to deliver faster or cheaper. It's knowing sooner whether you should keep going, or if you need to keep going, or if you should stop and do something else instead.

Wednesday, November 5, 2014

Don’t Waste Time Writing Perfect Code

A system can last for 5 or 10 or even 20 or more years. But the life of specific lines of code, even of designs, is often much shorter: months or days or even minutes when you’re iterating through different approaches to a solution.

Some code matters more than other code

Researching how code changes over time, Michael Feathers has identified a power curve in code bases. Every system has code, often a lot of it, that is written once and is never changed. But a small amount of code, including the code that is most important and useful, is changed over and over again, refactored or rewritten from scratch several times.

As you get more experience with a system, or with a problem domain or an architectural approach, it should get easier to know and to predict what code will change all the time, and what code will never change: what code matters, and what code doesn’t.

Should we try to write Perfect Code?

We know that we should write clean code, code that is consistent, obvious and as simple as possible.

Some people take this to extremes, and push themselves to write code that is as beautiful and elegant and as close to perfect as they can get, obsessively refactoring and agonizing over each detail.

But if code is only going to be written once and never changed, or at the other extreme if it is changing all the time, isn’t writing perfect code as wasteful and unnecessary (and impossible to achieve) as trying to write perfect requirements or trying to come up with a perfect design upfront?

You Can't Write Perfect Software. Did that hurt? It shouldn't. Accept it as an axiom of life. Embrace it. Celebrate it. Because perfect software doesn't exist. No one in the brief history of computing has ever written a piece of perfect software. It's unlikely that you'll be the first. And unless you accept this as a fact, you'll end up wasting time and energy chasing an impossible dream.”
Andrew Hunt, The Pragmatic Programmer: from Journeyman to Master

Code that is written once doesn’t need to be beautiful and elegant. It has to be correct. It has to be understandable – because code that is never changed may still be read many times over the life of the system. It doesn't have to be clean and tight – just clean enough. Copy and paste and other short cuts in this code can be allowed, at least up to a point. This is code that never needs to be polished. This is code that doesn't need to be refactored (until and unless you need to change it), even if other code around it is changing. This is code that isn't worth spending extra time on.

What about the code that you are changing all of the time? Agonizing over style and coming up with the most elegant solution is a waste of time, because this code will probably be changed again, maybe even rewritten, in a few days or weeks. And so is obsessively refactoring code each time that you make a change, or refactoring code that you aren't changing because it could be better. Code can always be better. But that’s not important.

What matters is: Does the code do what it is supposed to do – is it correct and usable and efficient? Can it handle errors and bad data without crashing – or at least fail safely? Is it easy to debug? Is it easy and safe to change? These aren't subjective aspects of beauty. These are practical measures that make the difference between success and failure.

Pragmatic Coding and Refactoring

The core idea of Lean Development is: don’t waste time on things that aren't important. This should inform how we write code, and how we refactor it, how we review it, how we test it.

Only refactor what you need to, in order to get the job done - what Martin Fowler calls opportunistic refactoring (comprehension, cleanup, Boy Scout rule stuff) and preparatory refactoring. Enough to make a change easier and safer, and no more. If you’re not changing the code, it doesn't really matter what it looks like.

In code reviews, focus only on what is important. Is the code correct? Is it defensive? Is it secure? Can you follow it? Is it safe to change?

Forget about style (unless style gets in the way of understandability). Let your IDE take care of formatting. No arguments over whether the code could be “more OO”. It doesn’t matter if it properly follows this or that pattern as long as it makes sense. It doesn't matter if you like it or not. Whether you could have done it in a nicer way isn’t important – unless you’re teaching someone who is new to the platform and the language, and you’re expected to do some mentoring as part of code review.

Write tests that matter. Tests that cover the main paths and the important exception cases. Tests that give you the most information and the most confidence with the least amount of work. Big fat tests, or small focused tests – it doesn't matter, and it doesn't matter if you write the tests before you write the code or after, as long as they do the job.

It’s not (Just) About the Code

The architectural and engineering metaphors have never been valid for software. We aren’t designing and building bridges or skyscrapers that will stay essentially the same for years or generations. We’re building something much more plastic and abstract, more ephemeral. Code is written to be changed – that is why it’s called “software”.

“After five years of use and modification, the source for a successful software program is often completely unrecognizable from its original form, while a successful building after five years is virtually untouched.”
Kevin Tate, Sustainable Software Development

We need to look at code as a temporary artefact of our work:

…we're led to fetishize code, sometimes in the face of more important things. Often we suffer under the illusion that the valuable thing produced in shipping a product is the code, when it might actually be an understanding of the problem domain, progress on design conundrums, or even customer feedback.
Dan Grover, Code and Creative Destruction

Iterative development teaches us to experiment and examine the results of our work – did we solve the problem, if we didn’t, what did we learn, how can we improve? The software that we are building is never done. Even if the design and the code are right, they may only be right for a while, until circumstances demand that they be changed again or replaced with something else that fits better.

We need to write good code: code that is understandable, correct, safe and secure. We need to refactor and review it, and write good useful tests, all the while knowing that some of this code, or maybe all of it, could be thrown out soon, or that it may never be looked at again, or that it may not get used at all. We need to recognize that some of our work will necessarily be wasted, and optimize for this. Do what needs to be done, and no more. Don’t waste time trying to write perfect code.

Tuesday, September 16, 2014

Can Static Analysis replace Code Reviews?

In my last post, I explained how to do code reviews properly. I recommended taking advantage of static analysis tools like Findbugs, PMD, Klocwork or Fortify to check for common mistakes and bad code before passing the code on to a reviewer, to make the reviewer’s job easier and reviews more effective.

Some readers asked whether static analysis tools can be used instead of manual code reviews. Manual code reviews add delays and costs to development, while static analysis tools keep getting better, faster, and more accurate. So can you automate code reviews, in the same way that many teams automate functional testing? Do you need to do manual reviews too, or can you rely on technology to do the job for you?

Let’s start by understanding what static analysis bug checking tools are good at, and what they aren’t.

What static analysis tools can do – and what they can’t do

In this article, Paul Anderson at GrammaTech does a good job of explaining how static analysis bug finding works, the trade-offs between recall (finding all of the real problems), precision (minimizing false positives) and speed, and the practical limitations of using static analysis tools for finding bugs.

Static analysis tools are very good at catching certain kinds of mistakes, including memory corruption and buffer overflows (for C/C++), memory leaks, illegal and unsafe operations, null pointers, infinite loops, incomplete code, redundant code and dead code.

A static analysis tool knows if you are calling a library incorrectly (as long as it recognizes the function), if you are using the language incorrectly (things that a compiler could find but doesn’t) or inconsistently (indicating that the programmer may have misunderstood something).

And static analysis tools can identify code with maintainability problems, code that doesn't follow good practice or standards, is complex or badly structured and a good candidate for refactoring.

But these tools can’t tell you when you have got the requirements wrong, or when you have forgotten something or missed something important – because the tool doesn't know what the code is supposed to do. A tool can find common off-by-one mistakes and some endless loops, but it won’t catch application logic mistakes like sorting in descending order instead of ascending order, or dividing when you meant to multiply, referring to buyer when it should have been seller, or lessee instead of lessor. These are mistakes that aren't going to be caught in unit testing either, since the same person who wrote the code wrote the tests, and will make the same mistakes.

Tools can’t find missing functions or unimplemented features or checks that should have been made but weren't. They can’t find mistakes or holes in workflows. Or oversights in auditing or logging. Or debugging code left in by accident.

Static analysis tools may be able to find some backdoors or trapdoors – simple ones at least. And they might find some concurrency problems – deadlocks, races and mistakes or inconsistencies in locking. But they will miss a lot of them too.

Static analysis tools like Findbugs can do security checks for you: unsafe calls and operations, use of weak encryption algorithms and weak random numbers, using hard-coded passwords, and at least some cases of XSS, CSRF, and simple SQL injection. More advanced commercial tools that do inter-procedural and data flow analysis (looking at the sources, sinks and paths between) can find other bugs including injection problems that are difficult and time-consuming to trace by hand.

But a tool can’t tell you that you forgot to encrypt an important piece of data, or that you shouldn't be storing some data in the first place. It can’t find logic bugs in critical security features, if sensitive information could be leaked, when you got an access control check wrong, or if the code could fail open instead of closed.

And using one static analysis tool on its own to check code may not be enough. Evaluations of static analysis tools, such as NIST's SAMATE project (a series of comparative studies, where many tools are run against the same code), show almost no overlap between the problems found by different tools (outside of a few common areas like buffer errors) even when the tools are supposed to be doing the same kinds of checks. Which means that to get the most out of static analysis, you will need to run two or more tools against the same code (which is what SonarQube, for example, which integrates its own static analysis results with other tools, including popular free tools, does for you). If you’re paying for commercial tools, this could get very expensive fast.

Tools vs. Manual Reviews

Tools can find cases of bad coding or bad typing – but not bad thinking. These are problems that you will have to find through manual reviews.

A 2005 study Comparing Bug Finding Tools with Reviews and Tests used Open Source bug finding tools (including Findbugs and PMD) on 5 different code bases, comparing what the tools found to what was found through code reviews and functional testing. Static analysis tools found only a small subset of the bugs found in manual reviews, although the tools were more consistent – manual reviewers missed a few cases that the tools picked up.

Just like manual reviews, the tools found more problems with maintainability than real defects (this is partly because one of the tools evaluated – PMD – focuses on code structure and best practices). Testing (black box – including equivalence and boundary testing – and white box functional testing and unit testing) found fewer bugs than reviews. But different bugs. There was no overlap at all between bugs found in testing and the bugs found by the static analysis tools.

Finding problems that could happen - or do happen

Static analysis tools are good at finding problems that “could happen”, but not necessarily problems that “do happen”.

Researchers at Colorado State University ran static analysis tools against several releases of different Open Source projects, and compared what the tools found against the changes and fixes that developers actually made over a period of a few years – to see whether the tools could correctly predict the fixes that needed to be made and what code needed to be refactored.

The tools reported hundreds of problems in the code, but found very few of the serious problems that developers ended up fixing. One simple tool (Jlint) did not find anything that was actually fixed or cleaned up by developers. Of 112 serious bugs that were fixed in one project, only 3 were also found by static analysis tools. In another project, only 4 of 136 bugs that were actually reported and fixed were found by the tools. Many of the bugs that developers did fix were problems like null pointers and incorrect string operations – problems that static analysis tools should be good at catching, but didn’t.

The tools did a much better job of predicting what code should be refactored: developers ended up refactoring and cleaning up more than 70% of the code structure and code clarity issues that the tools reported (PMD, a free code checking tool, was especially good for this).

Ericsson evaluated different commercial static analysis tools against large, well-tested, mature applications. On one C application, a commercial tool found 40 defects – nothing that could cause a crash, but still problems that needed to be fixed. On another large C code base, 1% of the tool’s findings turned out to be bugs serious enough to fix. On the third project, they ran 2 commercial tools against an old version of a C system with known memory leaks. One tool found 32 bugs, another 16: only 3 of the bugs were found by both tools. Surprisingly, neither tool found the already known memory leaks – all of the bugs found were new ones. And on a Java system with known bugs they tried 3 different tools. None of the tools found any of the known bugs, but one of the tools found 19 new bugs that the team agreed to fix.

Ericsson’s experience is that static analysis tools find bugs that are extremely difficult to find otherwise. But it’s rare to find stop-the-world bugs – especially in production code – using static analysis.

This is backed up by another study on the use of static analysis (Findbugs) at Google and on the Sun JDK 1.6.0. Using the tool, engineers found a lot of bugs that were real, but not worth the cost of fixing: deliberate errors, masked errors, infeasible situations, code that was already doomed, errors in test code or logging code, errors in old code that was “going away soon” or other relatively unimportant cases. Only around 10% of medium and high priority correctness errors found by the tool were real bugs that absolutely needed to be fixed.

The Case for Security

So far we've mostly looked at static analysis checking for run-time correctness and general code quality, not security.

Although security builds on code quality – vulnerabilities are just bugs that hackers look for and exploit – checking code for correctness and clarity isn’t enough for a secure app. A lot of investment in static analysis technology over the past 5-10 years has been in finding security problems in code, such as common problems listed in OWASP’s Top 10 or the SANS/CWE Top 25 Most Dangerous Software Errors.

A couple of studies have looked at the effectiveness of static analysis tools compared to manual reviews in finding security vulnerabilities. The first study was on a large application that had 15 known security vulnerabilities found through a structured manual assessment done by security experts. Two different commercial static analysis tools were run across the code. The tools together found less than half of the known security bugs – only the simplest ones, the bugs that didn't require a deep understanding of the code or the design.

And of course the tools reported thousands of other issues that needed to be reviewed and qualified or thrown away as false positives. These other issues including some run-time correctness problems, null pointers and resource leaks, and code quality findings (dead code, unused variables), but no other real security vulnerabilities beyond those already found by the manual security review.

But this assumes that you have a security expert around to review the code. To find security vulnerabilities, a reviewer needs to understand the code (the language and the frameworks), and they also need to understand what kind of security problems to look for.

Another study shows how difficult this is. Thirty developers were hired to do independent security code reviews of a small web app (some security experts, others web developers). They were not allowed to use static analysis tools. The app had 6 known vulnerabilities. 20% of the reviewers did not find any of the known bugs. None of the reviewers found all of the known bugs, although several found a new XSS vulnerability that the researchers hadn’t known about. On average, 10 reviewers would have had only an 80% chance of finding all of the security bugs.

And, not Or

Static analysis tools are especially useful for developers working in unsafe languages like C/C++ (where there is a wide choice of tools to find common mistakes) or dynamically typed scripting languages like Javascript or PHP (where unfortunately the tools aren't that good), and for teams starting off learning a new language and framework. Using static analysis is (or should be) a requirement in highly regulated, safety critical environments like medical devices and avionics. And until more developers get more training and understand more about how to write secure software, we will all need to lean on static analysis (and dynamic analysis) security testing tools to catch vulnerabilities.

But static analysis isn't a substitute for code reviews. Yes, code reviews take extra time and add costs to development, even if you are smart about how you do them – and being smart includes running static analysis checks before you do reviews. If you want to move fast and write good, high-quality and secure code, you still have to do reviews.You can’t rely on static analysis alone.

Wednesday, August 20, 2014

Don’t waste time on Code Reviews

Less than half of development teams do code reviews and the other half are probably not getting as much out of code reviews as they should.

Here’s how to not waste time on code reviews.

Keep it Simple

Many people still think of code reviews as expensive formal code inspection meetings, with lots of prep work required before a room full of reviewers can slowly walk through the code together around a table with the help of a moderator and a secretary. Lots of hassles and delays and paperwork.

But you don’t have to do code reviews this way – and you shouldn’t.

There are several recent studies which prove that setting up and holding formal code review meetings add to development delays and costs without adding value. While it can take weeks to schedule a code review meeting, only 4% of defects are found in the meeting itself – the rest are all found by reviewers looking through code on their own.

At shops like Microsoft and Google, developers don’t attend formal code review meetings. Instead, they take advantage of collaborative code review platforms like Gerrit, CodeFlow, Collaborator, or ReviewBoard or Crucible, or use e-mail to request reviews asynchronously and to exchange information with reviewers.

These light weight reviews (done properly) are just as effective at finding problems in code as inspections, but much less expensive and much easier to schedule and manage. Which means they are done more often.

And these reviews fit much better with iterative, incremental development, providing developers with faster feedback (within a few hours or at most a couple of days, instead of weeks for formal inspections).

Keep the number of reviewers small

Some people believe that if two heads are better than one, then three heads are even better, and four heads even more better and so on…

So why not invite everyone on the team into a code review?

Answer: because it is a tragic waste of time and money.

As with any practice, you will quickly reach a point of diminishing returns as you try to get more people to look at the same code.

On average, one reviewer will find roughly half of the defects on their own. In fact, in a study at Cisco, developers who double-checked their own work found half of the defects without the help of a reviewer at all!

A second reviewer will find ½ as many new problems as the first reviewer. Beyond this point, you are wasting time and money. One study showed no difference in the number of problems found by teams of 3, 4 or 5 individuals, while another showed that 2 reviewers actually did a better job than 4.

This is partly because of overlap and redundancy – more reviewers means more people looking for and finding the same problems (and more people coming up with false positive findings that the author has to sift through). And as Geoff Crain at Atlassian explains, there is a “social loafing” problem: complacency and a false sense of security set in as you add more reviewers. Because each reviewer knows that somebody else is looking at the same code, they are under less pressure to find problems.

This is why at shops like Google and Microsoft where reviews are done successfully, the median number of reviewers is 2 (although there are times when an author may ask for more input, especially when the reviewers don’t agree with each other).

But what’s even more important than getting the right number of reviewers is getting the right people to review your code.

Code Reviews shouldn’t be done by n00bs – but they should be done for n00bs

By reviewing other people’s code a developer will get exposed to more of the code base, and learn some new ideas and tricks. But you can’t rely on new team members to learn how the system works or to really understand the coding conventions and architecture just by reviewing other developers’ code. Asking a new team member to review other people’s code is a lousy way to train people, and a lousy way to do code reviews.

Research backs up what should be obvious: the effectiveness of code reviews depend heavily on the reviewer’s skill and familiarity with the problem domain and with the code. Like other areas in software development, the differences in revew effectiveness can be huge, as much as 10x between best and worst performers. A study on code reviews at Microsoft found that reviewers from outside of the team or who were new to the team and didn’t know the code or the problem area could only do a superficial job of finding formatting issues or simple logic bugs.

This means that your best developers, team leads and technical architects will spend a lot of time reviewing code – and they should. You need reviewers who are good at reading code and good at debugging, and who know the language, framework and problem area well. They will do a much better job of finding problems, and can provide much more valuable feedback, including suggestions on how to solve the problem in a simpler or more efficient way, or how to make better use of the language and frameworks. And they can do all of this much faster.

If you want new developers to learn about the code and coding conventions and architecture, it will be much more effective to pair new developers up with an experienced team member in pair programming or pair debugging.

If you want new, inexperienced developers to do reviews (or if you have no choice), lower your expectations. Get them to review straightforward code changes (which don’t require in depth reviews), or recognize that you will need to depend a lot more on static analysis tools and another reviewer to find real problems.

Substance over Style

Reviewing code against coding standards is a sad way for a developer to spend their valuable time. Fight the religious style wars early, get everyone to use the same coding style templates in their IDEs and use a tool like Checkstyle to ensure that code is formatted consistently. Free up reviewers to focus on the things that matter: helping developers write better code, code that works correctly and that is easy to maintain.

“I’ve seen quite a few code reviews where someone commented on formatting while missing the fact that there were security issues or data model issues.”
Senior developer at Microsoft, from a study on code review practices

Correctness – make sure that the code works, look for bugs that might be hard to find in testing:

  • Functional correctness: does the code do what it is supposed to do – the reviewer needs to know the problem area, requirements and usually something about this part of the code to be effective at finding functional correctness issues
  • Coding errors: low-level coding mistakes like using <= instead of <, off-by-one errors, using the wrong variable (like mixing up lessee and lessor), copy and paste errors, leaving debugging code in by accident
  • Design mistakes: errors of omission, incorrect assumptions, messing up architectural and design patterns like MVC, abuse of trust
  • Safety and defensiveness: data validation, threading and concurrency (time of check/time of use mistakes, deadlocks and race conditions), error handling and exception handling and other corner cases
  • Malicious code: back doors or trap doors, time bombs or logic bombs
  • Security: properly enforcing security and privacy controls (authentication, access control, auditing, encryption)

Maintainability:

  • Clarity: class and method and variable naming, comments, …
  • Consistency: using common routines or language/library features instead of rolling your own, following established conventions and patterns
  • Organization: poor structure, duplicate or unused/dead code
  • Approach: areas where the reviewer can see a simpler or cleaner or more efficient implementation

Where should reviewers spend most of their time?

Research shows that reviewers find far more maintainability issues than defects (a ratio of 75:25) and spend more time on code clarity and understandability problems than correctness issues. There are a few reasons for this.

Finding bugs in code is hard. Finding bugs in someone else’s code is even harder.

In many cases, reviewers don’t know enough to find material bugs or offer meaningful insight on how to solve problems. Or they don’t have time to do a good job. So they cherry pick easy code clarity issues like poor naming or formatting inconsistencies.

But even experienced and serious reviewers can get caught up in what at first seem to be minor issues about naming or formatting, because they need to understand the code before they can find bugs, and code that is unnecessarily hard to read gets in the way and distracts them from more important issues.

This is why programmers at Microsoft will sometimes ask for 2 different reviews: a superficial “code cleanup” review from one reviewer that looks at standards and code clarity and editing issues, followed by a more in depth review to check correctness after the code has been tidied up.

Use static analysis to make reviews more efficient

Take advantage of static analysis tools upfront to make reviews more efficient. There’s no excuse not to at least use free tools like Findbugs and PMD for Java to catch common coding bugs and inconsistencies, and sloppy or messy code and dead code before submitting the code to someone else for review.

This frees the reviewer up from having to look for micro-problems and bad practices, so they can look for higher-level mistakes instead. But remember that static analysis is only a tool to help with code reviews – not a substitute. Static analysis tools can’t find functional correctness problems or design inconsistencies or errors of omission, or help you to find a better or simpler way to solve a problem.

Where’s the risk?

We try to review all code changes. But you can get most of the benefits of code reviews by following the 80:20 rule: focus reviews on high risk code, and high risk changes.

High risk code:

  • Network-facing APIs
  • Plumbing (framework code, security libraries….)
  • Critical business logic and workflows
  • Command and control and root admin functions
  • Safety-critical or performance-critical (especially real-time) sections
  • Code that handles private or sensitive data
  • Old code, code that is complex, code that has been worked on by a lot of different people, code that has had a lot of bugs in the past – error prone code
High risk changes:
  • Code written by a developer who has just joined the team
  • Big changes
  • Large-scale refactoring (redesign disguised as refactoring)

Get the most out of code reviews

Code reviews add to the cost of development, and if you don’t do them right they can destroy productivity and alienate the team. But they are also an important way to find bugs and for developers to help each other to write better code. So do them right.

Don’t waste time on meetings and moderators and paper work. Do reviews early and often. Keep the feedback loops as tight as possible.

Ask everyone to take reviews seriously – developers and reviewers. No rubber stamping, or letting each other off of the hook.

Make reviews simple, but not sloppy. Ask the reviewers to focus on what really matters: correctness issues, and things that make the code harder to understand and harder to maintain. Don’t waste time arguing about formatting or style.

Make sure that you always review high risk code and high risk changes.

Get the best people available to do the job – when it comes to reviewers, quality is much more important than quantity. Remember that code reviews are only one part of a quality program. Instead of asking more people to review code, you will get more value by putting time into design reviews or writing better testing tools or better tests. A code review is a terrible thing to waste.

Site Meter