Some thoughts about testing

I well remember when this realization first came on me with full force. The EDSAC was on the top floor of the building and the tape-punching and editing equipment one floor below. [...] It was on one of my journeys between the EDSAC room and the punching equipment that, hesitating at the angles of [the] stairs, the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs.
- from Memoirs of a Computer Pioneer, by Maurice Wilkes

Testing is one of the great realities of software development. Ever since the first computers were built, programmers have been finding bugs in their code.

What can I say about a subject as old, and presumably well-explored, as testing? Well... kind of a lot, actually.

The Importance of Testing

Tests are the lifeblood of a big software project. Parts of the body that are denied circulation cannot survive. In the same way, parts of a software project that are cut off from testing will eventually rot away. The bigger the project is, the more testing you will need to keep it alive.

Tests need to be first-class citizens in your project. You wouldn't consider starting a project without selecting a build system or a runtime environment. So how can you consider starting a project without choosing what test frameworks you will use?

A lot of people keep their tests in a separate source control repository from the rest of their code. To me, this makes no sense. As the project changes and matures, the tests will have to change and mature. New features will be added that require new tests. In some cases, features will be altered or removed. Why would you want to inflict all the hassles of version skew on yourself?

It may be reasonable to put the test framework itself in a different repository than the project, if you believe that the framework will be useful for other projects as well. But it certainly doesn't make sense to put the tests themselves in any repository but the project repository.

Running Tests should be Easy

Anything that stops you from testing your code quickly is a big problem.

Does your code take hours to compile? That's a problem. When Rob Pike and the other designers announced the Google Go language, one of the things they were proud of was its rapid compliation time. Rapid compilation is a force multiplier. It will allow you to get much more work done in a shorter amount of time.

Does your test framework require an "expert" to set up? That's a problem. Bugs that could have been discovered and fixed in a day might require a week or more of back-and-forth between programmers and "test experts."

It often makes sense to have a division of labor between test writers and other developers. The former are often called SDETs, or "Software Development Engineers in Test." However, it doesn't make sense to create a system where tests can only be run by certain people. All of your developers should be able to run all of your tests. No exceptions!

For a lot of projects, it makes sense for developers to run a certain set of tests before submitting any change. These are often referred to as "smoke tests."

ladybug

Unit tests

Unit tests really are important. They're important for two reasons: because of the test coverage they provide, and because they encourage you to write code that is modular and testable.

If you think your project doesn't need unit tests, you are almost certainly wrong.

For Java, JUnit is a pretty good unit test framework. There are also frameworks available for C and C++, but I usually roll my own. One approach that I've used in the past for C code is simply to make each unit test a separate executable that returns success or failure.

Long-running tests

Not everything in the world can be unit tested. A lot of bugs show themselves only when you are running a full stack. To catch these bugs, you are going to need long-running tests. Some people call these system tests, or end-to-end tests, or integration tests. Whatever you call them, you are probably going to need them too.

Some of the system test I've written in the past have simply been shell scripts that ran an executable with various different options. You can get a lot done in a shell script without writing a lot of code. For some projects, this simply isn't enough, and you are going to need something more heavyweight-- like a set of Python scripts.

You Get what you Measure

To paraphrase management consultant H. Thomas Johnson, you get what you measure. This is as true in the software development business as it is in other walks of life.

One big mistake I've seen projects make in the past is to create test frameworks that were inadequate. For example, one project was creating software that was designed to run on a huge network of computers. But the test framework that all the developers used was just a shell script which started a few processes on their local computer. Needless to say, this was not a very good test.

In a lot of ways, a bad test framework is worse than no test framework at all. It encourages you to think that you are doing fine-- when in fact your project has major problems. You may add lots of features, confident that you can handle it-- when in fact, your existing codebase is almost untested. Like a fuel gauge that always points a "full," a bad test framework can do you a lot of harm.

If your project involves multiple processes, your test framework needs to be able to create multiple processes. If your project involves multiple computers, your test framework needs to manage multiple computers. And so forth. Absolutely do not compromise on the quality of the test framework, no matter how tempting it may be.

Strategies for Testing

Testing can be a costly proposition. Isn't there some way we can cut down on the overall cost?

How about those pesky users? They're always demanding more out of developers. Perhaps they should be made to bear some of the burden of testing.

A lot of open source projects rely a lot on user testing. In an open source project, users can engage with the developers directly on mailing lists and other forums, and have a dialog about bugs. Similarly, a lot of companies like Google and Facebook have started making products available to consumers while they're still "in beta"-- meaning unfinished.

Is this a good idea? As with most interesting questions, the answer is "it depends." What's the worst-case scenario if your software fails? Does the user lose his progress through the Mushroom Kingdom, or does the nuclear reactor create a mushroom cloud? If the answer is the latter, you probably want to avoid sending users out to do your testing.

Certain types of software are notorious for requiring a higher standard of quality. Users will accept a few crashes out of a game, but the first time a filesystem loses their data, they will probably decide to avoid it in the future. If your filesystem or database gets a bad reputation for reliablity, it might take years to dispel. However, if you are developing something like a game or a paint program, you are probably better off shoving the software out the door as soon as you can-- before other projects grab the market share or mindshare.

The Limits of Testing

I've spent this whole essay talking about how great tests are, and how essential they are to a well-run software project. So it may seem surprising that I am including a section on the limits of testing.

Well, it's true. Testing can only do so much for your project. There are always going to be nooks and crannies where bugs can hide.

Ideally, testing should be like the safety net below the flying trapeze at the circus. It's good to know that it's there, but you should not rely on it.

A good developer will do everything he can to reduce the testing burden. By using libraries rather than re-inventing the wheel, you can use code that has already been tested for you. In languages fortunate enough to have a static type system, you can use the type system to your advantage. For C code, Valgrind is essential. For both Java and C, Coverity and other static analysis tools are great.

The hardest bugs to analyze are always race conditions. Concurrency bugs are never deterministic, and always a huge burden to debug. You really need to get all your ducks in a row with respect to concurrency. Know what threads you are using and why. Know what data these threads are allowed to touch, and what data they are not. Unfortunately, this is an area where the tooling is still relatively primitive. You are going to have to use the force here.

Conclusion

Anyway, to those of you who made it this far-- good luck. Keep in mind the lessons of the past, and you should be able to build the software of the future.