01 January 2012

Five Key Ingredients of Essential Test-Driven Development

There are any number of ways to think about TDD.  For example, I use one metaphor when describing the business value of TDD with the organization's leadership, and another when describing the personal value to the members of the development team.

There are also formulations of the actual steps performed in TDD (the most brief being "Red, Green, Clean").

This list of "Key Ingredients" that follows is yet another way to think about Test-Driven Development.  I keep this list in mind when I'm training and coaching teams, to make sure they have all the fundamental tools in place in order to successfully adopt this elegant, but challenging, discipline.


The practice:

We write a test that expresses what we want the software to do, before coding the change to the software that implements that behavior.

This practice is one of the original dozen Extreme Programming (XP) practices. It's nothing new: For as long as we've had IC chips, chip designers have written tables of required outputs based on given inputs, before designing the circuits.  Colleagues at SQE tell me they found a view-graph (aka "slide") from the mid-80's that says "Test, Then Code."


Analysis through experimentation: You think about what you want the software to do, and how you would know it's right, rather than jumping directly to how the code will solve a problem.  You write, in effect, a small aspiration for the software.

Better interface design: You are crafting code from the client's perspective, and you are designing the interface first, rather than designing the solution and then exposing a clunky interface that suits your implementation.

Communication of understanding: Now and in the future, you and your teammates agree on the statement of the problem you are trying to solve.

Most defects are found immediately: You know when you're done.  You know when you got it right. And the aggregate of all prior tests keep you from breaking anything you or your teammates have ever coded since the inception of the product (if you are one of the fortunate few who started with TDD).

Easier to add behavior later: You are crafting a comprehensive safety-net of fine-grained regression tests. This allows you to make seemingly radical additions to functionality very rapidly, and with an extremely high degree of confidence.  This is perhaps the biggest benefit of TDD overall. We get productivity through maintainability, and we protect the investments we've made in robust software. You can think of each test you write as a little investment in quality. By writing it first, you are investing early.

Merciless Refactoring

The practice:

Once each new microtest and all of its siblings are passing, we look for localized opportunities to clean up the design towards better readability and maintainability.  Small steps are taken between test-runs, to be sure we are not breaking functionality while reshaping code.  Both tests and production code are subject to scrutiny.

Another of the original XP practices. Again, nothing new.  Programmers have always reshaped their modules to incorporate unanticipated new requirements.  Refactoring--any change to the code to improve maintainability without altering the explicit behavior of the code--is the formalization of a professional thought-process that has been with us since Alan Turing. 


Emergent Design: Your design is--as James Shore so eloquently says it--reflective, rather than predictive.  You don't try to anticipate the myriad variations you may need to support in the far future. Rather, you craft the code to meet the needs expressed in the tests today (and, again, since inception of the code base).

By removing "code smells" such as duplication, you allow the real variations expressed in your product's requirements to become encapsulated on demand. When you refactor mercilessly, what emerges is often a common Design Pattern, and not always the pattern you would have predicted, had you spent your time and mental energy trying to predict the optimal solution.

Emergent Designs tend to be simpler, more elegant, more understandable, and easier to change than those you would have otherwise created in UML before coding.

Nothing is wasted: There is no extraneous functionality to maintain and debug; there is no phase where you try to get it all right before receiving feedback from the system itself.

Ease of testing: You know that good object-oriented designs result in objects that are extremely easy to test. Often you refactor towards making the next test easier to write or pass. The test-refactor cycle thus supports and improves itself: It's easier to refactor with the presence of the tests, and the next test becomes much easier to write, because it's now obvious to you which class should own the behavior.


The practice:

We write the tests using a tool that makes them immediately executable.

This one may seem obvious, but its explicit definition offsets and clarifies the others. For example, can you do test-first without automating the tests? Yes. And is there benefit? Yes.  Is it TDD? Not quite.

Programmatic microtesting has also been with us for a very long time.  In the C programming language, we used to create (and later discard) a separate little main() program with some printf() statements. Now we capture these bits of testing, and effortlessly re-execute them for as long as necessary. Since they remain separated from production code, we need not wrap them in #ifdef TESTING statements.


Repeatability:  You know that each time you run the tests, all tests are the same, and not open to human (mis-)interpretation.  Testers, even very talented and insightful testers, can make mistakes or succumb to biases.

Faster feedback: Computers are good at doing these dull, repetitive tasks much more quickly than humans.  Subjecting testers to repeatedly running pages of manual test suites is cruel. You help testers recoup time to spend on exploratory testing and other testing activities that provide higher returns.

One life-critical ("if you break it, a patient may die") medical product I worked on had 17,000 automated tests after two years in development.  They all ran in less than 15 minutes.  We could add functionality, fix a bug, or refactor a bit of awkward code right up to the hour of deployment, and still know we hadn't broken anything. And if we did see a single failing test, we would not commit our changes to the repository. To do so would have been unprofessional and unethical.

Mock Objects

The practice:

When creating microtests, we often test an object's response to external situations at integration boundaries (e.g., "What does my DAO do on Save() if the database is down?").

We do this by replacing the interface or class that represents the external subsystem (e.g., it makes the calls to JDBC or ADO.Net) with a "Mock" or "Fake" object, which has been instructed (aka "conditioned") by the microtest to operate in a way that suites the scenario being tested (e.g., the mock simply throws the appropriate "Database is down" exception).  When the microtest calls our object under test, the object under test interacts with the mock, just as it would in production.


Predictability: You use mocks to decouple the code you are writing from databases, filesystems, networks, third-party libraries, the system clock, or anything else that could alter the clear scenario you are trying to test. The test will pass or fail depending only on what your production code does within the scenario arranged by the test.

Faster tests: By decoupling from anything slower than the CPU and memory access (e.g., hard-drives, network cards, and other physical devices), you and your teammates create a regression suite that can run thousands of tests per minute.

You may have a subset of tests that actually interact with these external systems.  Think of these as "focused integration tests" (another term I think I got from James Shore).  For example, on many systems I've worked on, we've wanted to test that we could actually read from and write to database tables (prior to Hibernate, Rails, etc.).  I recall those 17,000 tests included 2% to 5% functional and focused integration tests, which took perhaps 50% of the 15 minutes.  That's okay.

Nowadays, the database itself can often run in memory.  Or, you could develop on a solid-state laptop, such as the MacBook Air or Asus Zenbook. Different paths to the same goal.

The rule of thumb is to keep the comprehensive suite from taking much longer than a "coffee break".  A "lunch break" (1/2 or 1 hour) is far too long, because then the full suite would only be executed intentionally a few times per day.  On a small Agile team, we need to see the full suite run green a dozen or more times per day.

It's this fast feedback loop that enables "Agile" teams to stay agile after months of development.

To-Do List

The Practice:

We keep a list of things to test during this TDD session immediately available, so we can add test scenarios that occur to us during the session.  We keep this particular To-Do List around only until we complete our engineering task and commit our changes.

When performing the TDD steps (briefly described as Red, Green, Clean), often we see an opportunity to test a new corner case, or refactor some bit of stale design.  These to-do items go immediately onto the To-Do List, without interrupting the flow of Red-Green-Clean.

On teams I've coached, one responsibility of the pair-programming "Navigator" is to maintain this list on a 3x5 card.  We often started with the back of the task card (Note: not the story card, because our stories were subdivided into multiple engineering tasks).

You don't have to use a 3x5 card. Some IDEs allow you to enter "TO DO" comments into the code, which then appear automatically in a "Tasks" tab.  This is especially helpful if you're not pair-programming (you unfortunate soul).

Please, just never ever ever ever ever commit a "TO DO" comment into your repository. Checking in a "TO DO" comment is effectively saying, "Well, someone will do it, but we don't have time right now." The "Tasks" tab will fill with un-prioritized to-do items that "someone" (no one) is going to do "someday" (never), and the tab will become unusable as a To-Do List.


Idea repository: As you write tests, implement code, and refactor, you will notice other things you want to test or refactor. Some may be rather elaborate, but all you need is a brief reminder. Once that's recorded, you can allow your creative energy to return to the test or refactoring at hand.

Memory aid: You can take a break and, when you return, you have a short check-list of items you need to complete before you're done with the task.

Pair-Programming aid: The list gives the Navigator and Driver points to discuss, decide upon, or defer, and keeps them focused on real issues rather than long philosophical debates ("Opinion-Driven Development").

Jim Shore and I refined the practice one day, after the following exchange (from memory, plus some much-needed embellishment, after 10 years):
Jim (navigating): [Writes something on the To-Do List]
Rob (driving): [I stop what I'm doing] "Whatcha adding?"
Jim: "We're going to want to test what happens when that Connection parameter is null."
Rob: "Oh? How could it end up null?"
Jim: "Well, we're getting it from StupidVendorConnectionFactory, and so..."
[long discussion ensues]
Rob: [frowning] "I forgot where we were..."
Jim: [Sighing heavily] "Rob, Rob, Rob...Just re-run the tests!"
The next time Jim thought of a missing test...
Jim (navigating): [Writes something on the To-Do List]
Rob (driving): [I stop what I'm doing] "What's that?"
Jim (navigating): [Hides the To-Do List] "I won't say. As the driver, don't worry about what I'm adding to the list.  Let's focus on the test we need to get passing right now, then we'll review the list together."
Rob: "Can I add stuff to the list?"
Jim: "Of course! Just say it as you think of it, and I'll write it down."
Rob: "Okay.  Let's add 'fix, wrap, or replace StupidVendorConnectionFactory'"
Each item on the To-Do List must either get done, or be "removed."  We designated this by drawing a little checkbox before each item. We would check those we completed, and X out those we decided were not worth doing, or that had been replaced or subsumed by another to-do.
Jim: "I feel better, now that we have ConnectionPool wrapping StupidVendorConnectionFactory."

Rob: "And since ConnectionPool has been tested, and has no paths that could possibly return null..."

Jim: "...we don't need to check for null every time we pass a Connection. I'll take it off the list." [puts an X in the box]

Rob: "Are we starting to finish each other's sentences?"

Jim:  "Creepy.  Let's talk about hiring some developers locally so I don't have to pair with you every day."