Building Real Software: January 2012

Friday, January 27, 2012

Keys to Rescuing a Project

Fascinating presentation last night at the local PMI chapter by Dr. Tom Facklam on rescuing a project at a biotech firm. The company ran into serious problems during Phase III clinical trials of a breast cancer vaccine. This was a real life-or-death situation for the patients, and it also threatened the financial viability of the company.

The key success factors for surviving and resolving this project crisis apply to major problems on other projects:

Create a sense of urgency. Make sure everyone understands that you don’t have time to do things “properly”. Break down standard ways of working and ways of thinking. Fix first, document later.

Reassure the sponsors, customers and other stakeholders that your team can take care of the problem – even if you don’t know what the problem is yet, and if you don’t know how to fix it yet. Keep people (especially executives) from panicking, and secure a mandate to act.

Insist on an open, blameless environment. No witch hunts now or later. Prove to everyone that they can bring up information without repercussions. All that matters is fixing the problem.

Put the customer first. Don't put the customer at risk, don’t hide things from them, don’t try to “protect” them.

Communicate constantly. Be transparent. No spin. Stick to the facts – what you know, what you don’t know, what you are doing about it.

Brainstorm. Encourage people to try things even if it doesn’t make sense. Run multiple concurrent experiments until the data starts to converge on the problems and on solutions.

Wednesday, January 25, 2012

Software Security Starts with Software Quality

In Software Security: Building Security In, Cigital's Gray McGraw breaks software security problems down into roughly equal halves. One half of security problems are security design flaws: missing authorization or doing encryption wrong — or not using encryption at all when you are supposed to, not handling passwords properly, not auditing the right data, relying on client-side instead of server-side data validation, not managing sessions safely, not taking care of SQL injection properly, and so on. These are problems that require training and experience to understand and solve properly.

The other half are security coding defects — basic mistakes in coding that attackers find ways to exploit. Focusing on preventing, finding and fixing these mistakes is a good place to start a software security program. It's something that developers and testers understand and something that they can take ownership of right away.

Read my latest post at the SANS Appsec Street Fighter blog on how basic software security practices can take you a long way towards building secure software.

Thursday, January 19, 2012

Agile Before there was Agile: Egoless Programming and Step-by-Step

Two key ideas underlying modern Agile development practices. First, that work can be done more effectively by Whole Teams in which people work together collaboratively to design and build systems. They share code, the review each other’s work, they share ideas and problems and solutions, they share responsibility, they work closely with each other and communicate constantly with each other and the customer.

The second is that working software is designed, built and delivered incrementally in short time boxes.

Egoless Programming

The idea of developers working together collaboratively, sharing code and reviewing each other’s work isn’t new. It goes back to Egoless Programming first described by Gerald Weinberg in the early 1970s, in his book The Psychology of Computer Programming.

In Egoless Programming teams, everyone works together to build the best possible software, in an open, respectful, democratic way, sharing ideas and techniques. People put personal feelings aside, accept criticism and look for opportunities to learn from each other. The important thing is to write the best possible code. Egoless programmers share code, review each other's work, improve code, find bugs and fix them. People work on what they can do best. Leadership of the team moves around and changes based on what problems the team is working on.

The result is people who are more motivated and code that is more understandable and more maintainable. Sounds a lot like how Agile teams are trying to work together today.

Step-by-Step

My first experience with “agile development”, or at least with iterative design and incremental time boxed development and egoless programming, came a long time before the famous meeting at Snowbird in 2001. In 1990 joined the technical support team at a small software development company on the west coast of Canada. It was quite a culture shock joining the firm. First, I was recruited while I was back-packing around the world on a shoestring for a year – coming back to Canada and back to work was a culture shock in itself. But this wasn’t your standard corporate environment. The small development team all worked from their homes, while the rest of us worked in a horse ranch in the country side, taking calls and solving problems while cooking spaghetti for lunch in the ranch house kitchen, with an attic stuffed full of expensive high-tech gear.

We built and supported tools used by thousands of other programmers around the world to build software of their own. All of our software was developed following an incremental, time boxed method called Step-by-Step which was created by Michel Kohon in the early 1980s.

In Step-by-Step, requirements are broken down into incremental pieces and developers develop and deliver working software in regular time boxes (ideally two weeks long), building and designing as they go. You expect requirements to be incomplete or wrong, and you expect them to change, especially as you deliver software to the customer and they start to use it. Sounds a lot like today’s Agile time boxed development, doesn’t it?

Even though the company was distributed (the company’s President, who still did a lot of the programming, moved to a remote island off the west coast of Canada, and later to an even more remote island in the Caribbean), we all worked closely together and were in constant communication. We relied a lot on email (we wrote our own) and an excellent issue tracking system (we wrote that too), and we spent a lot of time on the phone with each other and with customers and partners.

The programmers were careful and disciplined. All code changes were peer reviewed (I still remember going through my first code review, how much I learned about how to write good code) and developers tested all their own code. Then the support team reviewed and tested everything again. Each month we documented and packaged up a time boxed release and delivered it to beta test customers – customers who had reported a problem or asked for a new feature – and asked for their feedback. Once a year we put together a final release and distributed to everyone around the world.

We carefully managed technical debt – although of course we didn’t know that it was called technical debt back then, we just wrote good code to last. Some of that code is still being used today, more than 25 years after the products were first developed.

After I left this company and started leading and managing development teams, I didn’t appreciate how this way of working could be scaled up to bigger teams and bigger problems. It wasn’t until years later, after I had more experience with Agile development practices, that I saw how what I learned 20 years ago could be applied to make the work that we do today better and simpler.

Thursday, January 12, 2012

Essential Attack Surface Management

To attack your system, to steal something or do something else nasty, the bad guys need to find a way in, and usually a way out as well. This is what Attack Surface Analysis is all about: mapping the ways in and out of your system, looking at the system from an attacker’s perspective, understanding what parts of the system are most vulnerable, where you need to focus testing and reviews. It’s part of design and it’s also part of risk management.

Attack Surface Analysis is simple in concept. It’s like walking around your house, counting all of the doors and windows, and checking to see if they are open, or easy to force open. The fewer doors and windows you have, and the harder they are to open, the safer you are. The bigger a system’s attack surface, the bigger a security problem you have, and the more work that you have to put into your security program.

For enterprise systems and web apps, the doors and windows include web URLs (every form, input field – including hidden fields, URL parameters and scripts), cookies, files and databases shared outside the app, open ports and sockets, external system calls and application APIs, admin user ids and functions. And any support backdoors into the app, if you allow that kind of thing.

I’m not going to deal with minimizing the attack surface by turning off features or deleting code. It’s important to do this when you can of course, but most developers are paid to add new features and write more forms and other interfaces – to open up the Attack Surface. So it’s important to understand what this means in terms of security risk.

Measuring the System’s Attack Surface

Michael Howard at Microsoft and other researchers have developed a method for measuring the attack surface of an application, and to track changes to the attack surface over time, called the Relative Attack Surface Quotient (RSQ).

Using this method you calculate an overall attack surface score for the system, and measure this score as changes are made to the system and to how it is deployed. Researchers at Carnegie Mellon built on this work to develop a formal way to calculate an Attack Surface Metric for large systems like SAP. They calculate the Attack Surface as the sum of all entry and exit points, channels (the different ways that clients or external systems connect to the system, including TCP/UDP ports, RPC end points, named pipes...) and untrusted data elements. Then they apply a damage potential/effort ratio to these Attack Surface elements to identify high-risk areas.

Smaller teams building and maintaining smaller systems (which is most of us) and Agile teams trying to move fast don’t need to go this far. Managing a system’s attack surface can be done through a few straightforward steps that developers can understand and take ownership of.

Attack Surface: Where to Start?

Start with some kind of baseline if you can – at least a basic understanding of the system’s attack surface. Spend a few hours reviewing design and architecture documents from an attack surface perspective. For web apps you can use a tool like Arachni or Skipfish or w3af or one of the many commercial dynamic testing and vulnerability scanning tools or services to crawl your app and map the attack surface – at least the part of the system that is accessible over the web. Or better, get an appsec expert to review the application and pen test it so that you understand the attack surface and real vulnerabilities.

Once you have a map of the attack surface, identify the high risk areas. Focus on remote entry points – interfaces with outside systems and to the Internet – and especially where the system allows anonymous, public access. This is where you are most exposed to attack. Then understand what compensating controls you have in place, operational controls like network firewalls and application firewalls,and intrusion detection or prevention systems to help protect your app.

The attack surface model will be rough and incomplete to start, especially if you haven’t done any security work on the system before. Use what you have and fill in the holes as the team makes changes to the attack surface. But how do you know when you are changing the attack surface?

When are you Changing the Attack Surface?

According to The Official Guide to the CSSLP

“… it is important to understand that the moment a single line of code is written, the attack surface has increased.”

But this over states the risk of making code changes – there are lots of code changes (for example to behind-the-scenes reporting and analytics and changes to business logic) that don’t make the system more vulnerable to attack. Remember, the attack surface is the sum of entry and exit points and untrusted data elements in the system.

Adding a new system interface, a new channel into the system, a new connection type, a new API, a new type of client, a new mobile app, or a new file or database table shared with the outside – these changes directly affect the Attack Surface and change the risk profile of your app. The first web page that you create opens up the system’s attack surface significantly and introduces all kinds of new risks. If you add another field to that page, or another web page like it, while technically you have made the attack surface bigger, you haven’t increased the risk profile of the application in a meaningful way. Each of these incremental changes is more of the same, unless you follow a new design or use a new framework.

Changes to session management and authentication and password management also affect the attack surface. So do changes to authorization and access control logic, especially adding or changing role definitions, adding admin users or admin functions with high privileges. Changes to the code that handles encryption and secrets. Changes to how data validation is done. And major architectural changes to layering and trust relationships, or fundamental changes in technical architecture – swapping out your web server or database platform, or changing the run-time OS.

Use a Risk-Based and Opportunity-Based Approach

Attack Surface management can be done in an opportunistic way, driven by your ongoing development requirements. As they work on a piece of the system, the team reviews whether and how the changes affect the attack surface, what the risks are, and raise flags for deeper review. These red flags drive threat modeling and secure code reviews and additional testing.

This means that developers can stay focused on delivering features, while still taking responsibility for security. Attack surface reviews become a part of design and QA and risk management, burned in to how the team works, done when needed in each stage or phase or sprint.

The first time that you touch a piece of the system it may take longer to finish the change because you need to go through more risk assessment. But over time, as you work on the same parts of the system or the same problems, and as you learn more about the application and more about security risks, it will get simpler and faster.

Your understanding of the system’s attack surface will probably never be perfect or complete – but you will always be updating it and improving it. New risks and vulnerabilities will keep coming up. When this happens you add new items to your risk checklists, new red flags to watch for. As long as the system is being maintained, you’re never finished doing attack surface management.

Friday, January 6, 2012

Static Analysis isn’t Development Testing

I constantly get emails from Static Analysis vendors telling me why I need to buy their technology. Recently I’ve been receiving emails explaining how my team can use static analysis tools to do impressive things like “test millions of complex lines of codes [sic] in minutes”.

Hold on now, pardner. Running static analysis tools against your code to enforce good coding practices and check for common coding mistakes and violations of security coding rules is a darn good idea. But this isn’t testing, and it’s not a substitute for testing.

Static Analysis tools like Coverity, Klocwork Insight or Klocwork Solo for Java Developers,Findbugs,HP Fortify,IBM’s Software Analyzer, Microsoft’s FXCop and Veracode’s on-demand static analysis checking are a step up from what compilers and IDEs check for you (although IDEs like IntelliJ IDEA include a good set of built-in static analysis code checkers). These tools should be part of your development and continuous integration environment.

They are fast and efficient at finding common coding mistakes and inconsistencies in large code bases. The kinds of mistakes that are easy to miss and hard to test for, and that are usually easy to fix. This makes code reviews more efficient and effective, because it lets reviewers focus on higher-level problems rather than low-level correctness. And they can identify areas in the code that need to be refactored and simplified, and areas of the code that may need more testing or deeper review – especially if the tool finds problems in code that is already “working”.

But static analysis isn’t the same as, or as good as, testing

Static analysis tools can’t test the code to validate that it does what it is supposed to do. That the features that the customer requested are all there and that the business logic is correct (unit testing and functional testing and acceptance testing). They can’t check that the interfaces of a module and its behavior is correct (component-level testing), or that the larger pieces of the system work correctly together and with other systems (system testing and integration testing). Or that the system will hold up under load (performance and stress testing and soak testing), or whether it is vulnerable to security attacks (pen testing), or that it people can actually understand how to use it (usability testing). Static analysis tools, even the best ones, are limited to finding “a narrow band of code-related defects”.

I appreciate that static analysis tool vendors want to find new ways to sell their solutions. Maybe this is why Coverity is now calling its static analysis toolset the “Industry’s First Development Testing Platform”. Their message seems to be that developers don’t test their code and won’t fix bugs found by testers “because of organizational silos and conflicting priorities”. So static analysis has to fill this perceived gap.

By improving integration with development tools like Eclipse and Visual Studio and making the static analysis checks run faster, Coverity has made it easier for developers to find and fix more coding bugs earlier – that’s good. But better packaging and workflow and faster performance doesn’t change what the tool actually does.

Development Testing or Developer Testing

Running a static analysis tool isn’t “Development Testing”. Development Testing or Developer Testing is what developers do to test their own code to see if it works: walking through their code in a debugger as it executes and making sure that it does what they expect, profiling it to make sure that performance doesn’t suck, writing automated unit tests and component tests to make sure that the code does what it is actually supposed to do.

Confusing static analysis tools with other automated testing tools is a bad idea. It misrepresents what static analysis tools can do. It could lead people to believe that static analysis is a substitute for testing. Bill Pugh, an expert in static analysis and the father of FindBugs, makes it clear that while static analysis tools definitely work, “testing is far more valuable than static analysis” at finding bugs. And security experts like Jeremiah Grossman at White Hat Security debunk the idea that static analysis is enough to check for security bugs or other kinds of bugs in a system.

Static analysis tools are getting better and easier to use, they’re getting faster and more accurate and they throw off less false positives than they did even a couple of years ago. Everyone should use them if they can find a tool that works well for them and that they can afford. If you’re in a Java shop for example, there’s no excuse not to be using Findbugs at least – it’s free and it works. My team uses multiple different static analysis tools at different stages of development and continuous integration, because different tools find different kinds of problems, and some tools take longer to run. These tools find real bugs and coding problems. But there’s too much hype in this market. In the land of software testing, static analysis tools don’t “reign king”.