21 November 2012

Startups and TDD: Building The Next Big Thing with Disciplined Agile Engineering Practices

I've coached and trained so many start-ups who are in a later round of funding, or are building the next release of their software, and wish they had done things differently. Usually, I'm there to help them establish good Agile engineering practices (and help clean up the mess).

I understand the "just get it delivered" pressure on startups, and that they have to beat the competitors (both known and unknown) to market.  But I don't buy into the notion that excessive technical debt must be accrued in the first delivery. It's not necessary, because test-driven development (TDD), pair programming, continuous integration (CI) and other "Agile Engineering Practices" (a.k.a. "Scrum Developer Practices" a.k.a. "Extreme Programming (XP) Practices") all provide actual, tangible benefits; and much more quickly than most people expect.

In my developer courses, I often quote the Nagappan Paper:
The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD.
-- research.microsoft.com/en-us/groups/ese/nagappan_tdd.pdf, Nagappan et al,  © Springer Science + Business Media, LLC 2008 
If I were seeing my fellow investors get that kind of return on investment, I'd want to get in too.  And early!

Of course, the choice of whether to take on technical debt or to "pay as you go" by adopting these practices from the start would depend on how long it takes for the practices to pay for themselves.  (And, yes, of course each has a cost.)

My friend and colleague, Arlo Belshee, and I were having a conversation about Agile transitions.  Arlo, like myself, is an old XP aficionado (at least, as far as I can tell...sometimes Arlo is hard to read), and we've both had amazing successes and wonderful experiences on full-blown XP teams.  I must have asked him which practices he would suggest the team implement first, assuming they needed to pick up these practices gradually.  He chose continuous integration and pair-programming.

I was a little surprised he didn't include TDD, because TDD resolves so many root causes of trouble on Agile teams. But his explanation won me over:  These two practices immediately provide very fast feedback loops and high-bandwidth communication for the team.

By emphasizing "immediately" I'm suggesting that these practices pay for themselves right away, so avoiding them because they appear costly is a poor bet.

In my own experience, TDD also starts to pay dividends almost immediately.  Even within my 3-day TDD course, many developers report that a single microtest caught something they could not have foreseen.  Software development has become too complex an endeavor to ignore the benefits of a comprehensive safety-net of unit-tests: It has eroded the Amazing Predictive Powers of most programmers.

One client who put all developers through the TDD course reported, after only about 4-6 months, that their latest release was the least defective release they had delivered in many years.  One developer said that a single trivial unit test had saved him from including a defect that would have crippled one high-end (i.e., $$$$) client, and he felt that disciplined TDD had likely saved him his job.

And by reflecting upon merely two of my own longer-term product development efforts that utilized XP, I can think of three cases where TDD+pairing+CI saved or made the organization a significant amount of money (at least $ 1/2 mil/year). Due to the malleability and maintainability of the software, our teams were able to accept a surprising, radical "mini Black Swan" user-story which:
  1. Opened up an entirely new market in a non-English-speaking country.
  2. Allowed doctors to more efficiently use our emergency-oriented software in their routine, day-to-day operations.
  3. Allowed a handful of highly-paid specialists to regain over 60% of their work-week that was previously spent manually transforming and re-entering patient data.
Each of those events was a surprise triple-win (for the customer, the organization, and the team), and each occurred within 6 months of adopting CI, TDD, and pairing.

If these disciplines reap benefits after such short periods of time, then the accrual of technical debt is only appropriate where the product can be written "in a garage" in a matter of days, and an upgrade will never be necessary.  Such products may exist, and they may even be quite useful and profitable (e.g., perhaps a smart-phone app).  I've never been involved in such a product's development, obviously because there would be no need for my kind of training and coaching.  But if I were called in to help develop one of these from scratch, would I still begin with good Agile engineering practices?  Yes!  Because I do not know what the future holds for that product, and I'd want to build in quality and maintainability, just in case we had built The Next Big Thing.

28 August 2012

A Recipe for an Agile Team Space

I was recently asked to help design team rooms for a client.  They may have been planning to hire a ergonomic architect/designer in addition to an Agile Coach.  Great, but before they got that far, I had two suggestions for them:
  1. Involve your existing Agile team members in discussions in order to uncover their needs.
  2. Involve someone who has actually been on a number of Agile teams in a variety of spaces.  Of course, I volunteered me!  When I've acted as XP coach (mostly 1998-2004), I sat with teams and wrote code.
Some ingredients that I recommend for a tasty Agile team-space:
  • A space to hold the daily stand-up, near where the iteration or kanban board and graphs are prominently displayed.
  • Space to sit side-by-side at a computer, rather than having to look over someone's shoulder, cramped in their cube.  Desks that have a long, straight edge, or slight outward curve, work best. Rolling chairs are great, but should not be too bulky.  Whether or not your teams will be doing pair-programming, you must avoid restricting collaboration. Cubes are innovation-constraints.
  • Plenty of available whiteboard space for brainstorming (not already used for charts).
  • A space to hold an ad hoc team meeting, without having to reserve a conference room.  Just think: If the team can hold all of its own meetings in its own space, that will free up the conference-room schedule for the execs.  Whenever I see a dozen people get up and run out of a cube farm, only to file into a small conference room, I have to laugh...sadly. Only the manufacturers of cube walls are happy with this arrangement.
  • Sound-blocking walls or dividers to separate the team from other teams or co-workers outside the team.  A team must be able to be noisy when necessary. If you're breaking up a huge open floor, consider hanging those architecturally interesting glass dividers above a half-wall. This gives teams plenty of light from windows without having to listen to the neighboring team's release-retrospective party.
All of the above need to be incorporated into the same space, not spread out. A meeting could be held by having everyone turn their chairs. A brainstorming session between two developers should allow others to tune in, or tune out, without leaving their workstations.

You can easily prototype a team-space design:  Within a single walled-off room, start with one team, fold in some folding tables, and toss with a few rolling whiteboards. Then wait and see how the team chooses to arrange their own furniture.

Often, the teams remain happier with the folding tables than with prefab, fixed-position desks.  They like to be able to convert the space to meet any occasion, such as a theater for Thursday evening movie night.
A gourmet team-space at Menlo Innovations, Ann Arbor, MI

Now, garnish with the following:
  • Space to hold a private conversation with a colleague, or to call home, or to feed a baby.  This does not, however, need to be a personalized cube, one per person.  It can be a shared space. It needs to have a door to close and block out the rest of the team.
  • A table or shelving unit for snacks, books, and other sundry items.
Note that I left these last items out of the "complete team space" list.  These last few ingredients could be incorporated into the team-space, or not:  As long as they're within a short walk, we have provided an opportunity to get up out of our chairs.

Q: "Where do I store my personal items, certificates of achievement, pictures of family?"

Keep these to a minimum, keep them portable (in a purse or backpack), and set them up wherever you're working each day.

When I see pictures of a spouse or child on a cube wall, I often ask, "How long has it been since you've seen them?" If the answer is greater than nine hours, I say "Perhaps it's time you head home!"

Q: "My books?"

I encourage teams to have a team library (perhaps with the snacks).  Write your name on the book and add it to the library.
 
Remember: We're at work to work.  It should be a place we want to go, to have fun collaborating with our colleagues and creating innovative solutions.  Not a place so painful that we need to distract ourselves with photos of Maui, or Foosball.
By the way, I have nothing against quieter, mentally stimulating games or toys, such as Nerf guns.  In fact...
  • Nerf guns. A necessity.
The huge, elegant team spaces at the new Pivotal Labs building, San Francisco, CA. I arrived about 30 minutes before taking this picture, and the place was packed with enthusiastic, happy developers. A bell rings at 6:15pm each day to remind people to go home. I was asked to take this from a distance, and towards banks of logged-out workstations, to assure that client IP was protected. Rest assured, each station had at least one huge screen and comfortable seating for two.

03 August 2012

The ROI of Test-Driven Development

Leadership wants to know:  "Why should my teams be doing TDD?  What is the return on investment?"

Though TDD is, of course, not a "silver bullet," we've found that TDD solves a number of key software development challenges, including those that are exposed by the frequent releases and high visibility expected of an Agile team.

(For background on these challenges, read about Technical Debt and the Agilist's Dilemma. To grasp the root systemic problem, read my "new" twist on the old Iron Triangle.)

I'll outline some benefits, and some costs.  This is probably not a complete list, but represents the most significant benefits and costs that I've observed.  Where appropriate, I've summarized in a bold, italicized sentence, so you can absorb the main points quickly, or delve into each item in detail, depending on your current needs. 

The Biggest Benefits of Test-Driven Development

 

Defect Reduction


Functionality that doesn't function provides no business value.  TDD allows us to start with high quality, thus providing real value.

Often the most immediate and obvious detectable benefit of TDD is the reduction of defects (60% to 90% fewer, according to this PDF describing the IBM/Microsoft study).

The key to this benefit is the creation of faster feedback loops. If a developer can find a mistake instantly, before check-in, then this is typically weeks (or months) before the bug would otherwise be identified by traditional testing efforts or code reviews. This early detection and elimination of defects avoids the waste of rework involved in finding and fixing a bug, plus having to assign a developer to do the work, and to re-familiarize himself with that region of code.

TDD reduces both (1) defects caused by guessing (incorrectly) that a particular implementation will behave as expected, and (2) defects caused by adjusting design to add new functionality to existing code.  A freshly-written test will catch the former before it's ever checked-in, and the existing suite of previously-written tests will prevent the latter.

When a defect is found (they will still happen), we add any missing tests that describe the defect. We do this even before we fix the defect. Once fixed, that defect can never resurface.

Faster Feature Time to Market (aka Cycle Time)


TDD allows developers to add innovative features as rapidly as the founder did while hacking in her garage, without damaging the original investment.
 
Comprehensive testing and rapid feedback provide complete confidence that any changes to the code (for new features or to improve the design) can be made swiftly, without breaking existing functionality.

When teams don't have to worry about breaking stuff, they can add new behaviors and features much faster.  This gives us a shorter cycle time for innovative features.

I have a number of first-person stories (which I'll share in a future post) where the teams I've worked on (as a contributing XP coach) have been able to turn around a "major architectural change" or "drastic re-purposing" in less than a week.  Had we not been following our TDD engineering practices, we believe the estimates would have been measured in months, not days.

In my experience, most of this benefit is taken for granted (after all, what do we have to compare it to?). Yet even when the great delightful surprises have not yet happened, the whole team (including Product) often notices a subtle shift towards a sense of ease and pride in the team's productivity and the quality of the delivered software.

Improved Focus


TDD helps developers think clearly about the task at hand, without having to continuously hold the whole complex system in their heads. They avoid over-extrapolation and over-design.

"Thinking in tests" allows developers to craft their code by (1) stating a goal for the code (as a new test), (2) confirming that this is indeed something new, (3) implementing just enough to get it to work correctly, (4) reshaping the design to fit well within the rest of the body of code.  This approach gives them opportunity to reflect on needed behaviors and edge cases without over-engineering.

Additionally, developers avoid the frequent context-switching incurred by bug-hunting, and recoup a tremendous amount of time typically spent in the debugger.  During the entire year of 2002, working full-time on a single product, I recall our whole team firing up the debugger only once or twice (averaging once every 6 months).  Compare that to a developer I met who claimed he spent 80% of his time in the debugger prior to TDD.  Even if that's an exaggeration, his rough estimate suggests a large amount of wasted time spent bug-hunting.

Parallel Efforts Without Conflict


Since, with disciplined TDD, each developer (or pair of developers) is expected to run all of the tests in the regression suite and confirm that 100% pass before committing new changes, there is much less opportunity to disrupt or destroy someone else's hard work.

If your team is larger than two developers, then they will likely be working simultaneously on different, but possibly related, areas of the system.  Without a discipline of writing and running comprehensive, pinpoint-accuracy tests, we often damage a prior check-in when we merge or integrate files.  Some teams will instead create a complex tree of code-repository branches, which is really just a way of kicking the can down the street.


The Costs of Implementing a TDD Discipline


As with any investment, you will want to consider costs.  And TDD, like any worthwhile discipline, has its price.

More Test Code


"Twice as much code!" people often protest. Yes, unit-test code is code, and needs to be written by a developer.  Also, test-code tends to be more script-like and verbose. It is, after all, a form of developer-to-developer communication.

Actually, unit-testing code using TDD tends to result in a 2/1 ratio between test code and production code.

"Three times as much code?!" Not exactly...

If you are concerned that so much test code will create a bottleneck, you will be happy to know that TDD tends to reduce the amount of production code it takes to solve a business problem.  TDD reduces duplication by allowing teams to avoid copy-paste coding techniques and stove-pipe solutions.  (A "stove-pipe solution" is one where each activity, page, or feature has its own full set of code at each architectural layer.) Teams that do not do TDD may rely on copy-paste in order to avoid breaking anything that already works, or to avoid interfering with another developer's parallel efforts.

So, though the ratio between test code and production code may be 2/1, it may not really be three times as much code as would otherwise be written.

Besides, in knowledge work, typing speed is not your limiting factor.

Learning Curve


At first there will be a slowing of delivery of new functionality, as the team gets used to using the TDD practice, and as they pay down small high-risk portions of the technical debt.

The Microsoft/IBM study noted an initial slow down.  This is true whenever we begin a new discipline.  Think about starting a new exercise regimen, or start flossing your teeth, or learning the piano: At first, practicing slows you down, it hurts, and it may not seem worth the effort.

After a few months, teams start to rely on their safety-net:  That comprehensive suite of unit tests.  This gives them courage to make bold moves, and to check that those moves haven't reduced the value of the software.

 Developers report useful feedback resulting in an increase in "flow" in short order: usually less than 30 days. (Often, a majority of participants in my 3-day Essential Test-Driven Development course will report that a single surprising test-failure, or a well-tested refactoring, helped them see how TDD is a superior way to craft code.)

Business results can take a little longer: TDD reduces both old and new technical debt, and allows new, even unexpected, functionality to be added rapidly. It may take some reflection on a smoother release to market, or the occurrence of a very short cycle time, before the business is aware of this ROI.

TDD may at first seem like an exotic and esoteric discipline, and possibly quite challenging.  Actually, people find TDD easier to learn than good "test-after" unit-testing. I like to say learning TDD is like learning to walk on your feet.  You've been taught in college and/or in the industry that walking on your hands is "the way it's done in our industry."  I spent 13 years writing code before testing, and 13 years doing TDD. I find TDD to be more natural for my scientific/engineering/inquisitive mind, and far more satisfying.

I've trained and coached hundreds of teams on how to "walk on your feet" using TDD. The approaches that have been most effective:
  • My 3-day Essential Test-Driven Development course followed (at some point) by a day of coaching around a team's specific technologies, existing code, and other challenges.
  • -or- At least eight days of intensive coaching with a dedicated, enthusiastic (or desperate) team. I work with pairs of developers on their development tasks.  Sometimes we're writing characterization tests for legacy code, sometimes we're testing the "untestables," and sometimes I'm providing a half-technical, half-emotional system of support that allows people to take courageous steps that they would otherwise never take.
Some teams have done it on their own.  They'll hire experienced, "test-infected" developers to cross-pollinate the team over time.  Or they'll read the books and coach each other over time.  These have the added cost of creating extra technical debt until the whole team is doing TDD in a disciplined manner.

Test Maintenance


The test code does have to be maintained, and with the same relentless discipline as the production code.  But if it's done as ongoing refactoring towards maintainability (i.e., writing the next test), it can be managed such that it never grows into a big mess.

Teams that refactor regularly (every two or three minutes) find that both the tests and the code behave themselves; they never grow into monsters.

In fact, teams who have been following the discipline for a year or two have reported to me that they spend a majority of their time and efforts on test maintenance.  Though this sounds horrific (and expensive) on the surface, they are really saying that they need only expend minimal effort on refactoring or designing their production code.  These teams are, truly, "Test-Driven."

A Good Investment?


Each manager/exec/leader will need to do the cost-benefit analysis for her own teams.

In my 26 years of experience--half with teams building software prior to TDD and half with teams doing TDD wholeheartedly--I find this one discipline to be the most potent practice in the Agile Engineering toolbox. I am thoroughly convinced that a focus on quality leads to real productivity, and that "thinking in tests" ("test-driven" and "behavior-driven" are other ways of saying this) leads to clarity of requirements, just enough engineering, and the highest levels of quality.

Let your teams try walking on their feet again. You'll be pleased with the results.

08 June 2012

Productivity Versus Quality in the Iron Cage of Death

Iron Triangle
The Classic Iron Triangle

(Re-)Introducing the Iron Triangle


You and your team, division, tribe, or organization are given a task. You must complete the whole project, you don't have extra money to hire more people, and it has to be completed by a specified date.  This is known as the "Triple Constraint" or the "Iron Triangle":  All variables have been pinned down: They're not really variable by the time you see them.  

If the project does succeed, the next project will be constrained further.  "Good job! Now let's see you do more with less."

Despite the fact that we've known for thousands of years that this arrangement is likely doomed to fail, the Iron Triangle still survives within many corporate cultures.

Why don't we see the fallacy behind the Iron Triangle?  (1) Because the failures occur after the project is completed. And (2) because the Iron Triangle itself is a flimsy model.



Five Core Metrics

Performance
Performance as a Function of Time, Effort, and Size

Performance According to the Five Core Metrics


I must have misread this book.  I recall reading about time, effort, and size, and then reading about how performance is a function of the other three.  Let's see, how many metrics is that? 1 + 3 = 4.

Then the 5th metric is introduced: Quality!

I recall being stunned that it wasn't included in the performance equation. Were the authors really suggesting that, as long as we meet the Triple Constraints, we were providing value even if we were producing total crap?

I must have misread this book.

Remember Quality


I used to start this client-coach conversation with "slow down to speed up."  The problem with "slow down to speed up" is that it's as valuable as "buy low, sell high": True, but absolutely useless.

Quality
The Wobbly Iron Rhomboid

A better metaphor is that of the chef who sharpens her knives before cooking. This simple practice makes chopping safer, and thus the chef can chop with more confidence that the knife is going to go where she wants it to go. All chefs learn this, and those who choose not to follow this simple advice often end up with a mangled finger, a trip to the ER, or worse. Skipping this essential practice is a false "short cut" that leads to wasted time, effort, and money.

Software development is chock full of false short cuts.  When we ask our teams to deliver everything, on-time, at cost, we have pinned down the three corners of the Iron Triangle.  Of course the team will find ways to achieve your goals, consciously or unconsciously.  And there's only one place where the Wobbly Iron Rhomboid can wiggle if those three corners are pinned:  Quality will suffer.

"Yay! We've released version 1.0!" A resounding victory, until the customers start to use the product.
 

Lead with Quality


I consider myself experienced enough to know why start-ups need to rush to market.  And yet, having worked on both highly successful software, and at least two disastrous VC-funded dot-COM start-ups, I have noted this:  Companies and products do much better over time if they deliver fast, and they deliver less than everything, with high quality.

What do you get when you add features to buggy code?  More buggy features.  If the feature doesn't work, the product hasn't provided the intended value, and the customers wander off. When we push teams to deliver more and more functionality, without giving them the routine breathing room to pay down the technical debt created by rushing, we risk product melt-down.

The team must be encouraged, very early on, to establish disciplines that create high-quality features, and preserve the quality of those features over time.

Teams who do this frequently discover that these same disciplines improve their ability to add features later on, reducing the time it takes for an innovative new feature to reach production. In other words, disciplines that establish and preserve high quality also improve the throughput of value.

So, as with every "Iron Cage Death Match," everyone wins. By building quality in, productivity--the real throughput of value--is also increased.

How does a software development team "sharpen the chef's knives"? Stay tuned!

27 May 2012

The Agilist's Dilemma

Let me tell you a dirty little secret: An Agile approach to crafting software comes with its own challenges.

I'm guessing you already knew that.  Does the following tale seem familiar?

Fields of Green

Imagine you start developing fresh new "greenfield" software today using the basic Scrum process framework (for example), and little more.  The developers design and build just enough to complete the commitments of the first iteration; the testers test it, and the demo goes smoothly.

This continues for a handful of iterations.  The team starts to "gel," gives itself a name that has little to do with the current project, holds meaningful meetings, and thrives in its collaborative team space, and even starts to handle its own personnel issues without escalation. The Product Champion (Scrum "Product Owner", XP "Onsite Customer") is engaged, enthusiastic, and encouraged by the increments of real value demonstrated in each Demo. All is sunny!

Then a subtle shift occurs. Defects are found. New functionality starts to take longer and longer to build. Developers discover that with each new story, they're having to reshape their design to accommodate new functionality without breaking old (i.e., they must "refactor"). But folks outside of engineering can't see this: Product folks may worry about the cost of this "refactoring" stuff. Testers know they have to test the new stories, but also every old story as well. But they're understaffed (as always) and start to accrue a separate backlog of stuff to test.

Eventually, real throughput (valuable new functionality with an acceptable level of quality) in each iteration comes to a dead stop.

The results of the Agilist's Dilemma
That's what I call the Agilist's Dilemma: The very fluidity that we espouse has become an impediment. "Had we only planned and designed for this! If only we knew what all the test cases would be! We should go back to a phased approach!"

Okay, calm down, everyone.  Let's not get all crazy. Remember the good old Death March? The all-night bug-fixing parties? Remember when the classic phased approach destroyed tech companies at a break-neck pace? Ah, yes, the Good Old dot-BOMB Days.

The dot-COM fiasco actually happened after the inception of Scrum and Extreme Programming, but people viewed Agile methods as strange, chaotic, and fringe.  That those methods had conceptual roots in Deming's work from the 1950's actually didn't help our case at all:  People love to find excuses to dismiss Deming.  Plus there are always high-profile examples of organizations that are very successful without adopting any particular form of process whatsoever. There are always the lucky few.

We know that Agile is about not trying to predict the unpredictable future. It's about seeing what's happening early, so these problems can be repaired while they're small. And it's about winning (creating value) within this iteration (or release), while setting ourselves up to win in the next iteration (or release). It's all supposed to provide the ability to deliver a predictable stream of value while remaining adaptable to the changing tides of business and customer demands.

And that's where our imaginary team is now:  If they (developers, testers, product, and leadership) are sensitive enough to the signs, they'll catch this early, and try something to fix it. That's exactly why Agile methods work:  We detect trouble early enough to do something about it.

The Elusive Silver Bullet

We know there are no silver bullets; no quick fixes. Yet most teams and organizations spend a lot of time covering surface symptoms with band-aid techniques, with the hope that they can just hold it all together through the next release. After that, "we'll take the time to fix everything that went wrong." Some of these techniques: hiring more testers with less experience with the product, mandating overtime for developers, and shipping on time and hoping the customers won't notice the quality issues. All of these band-aids are counter-productive, and subtly lay the blame on the people doing the work.

I think this occurs because of two fundamental problems:  Most people are unaware of the subtleties of the system within which they find themselves, and most are driven by metrics that don't relate directly to the health of the whole system. Even at the highest levels of management, often people are partially blinded by their own "performance-based" bonus check. When people are forced to compete within the organization for perceived scarce resources, the whole organization suffers.

I'm not suggesting the fault lies with management, either.  I've heard that the Lean approach is to "blame the process."  I find that blaming is like hoping:  A total waste of time. What all the Lean/Agile/Scrum literature is really suggesting is that we realistically examine and adjust the system.  Scrum gave us perhaps the most pithy and memorable phrase: "Inspect and Adapt."

So we take the time to dig a little deeper to locate those "root causes": Those things that are generating the myriad symptoms.  "Why did that happen?  What's really going on here? What can we do (or stop doing) to help resolve this?"

So we stop blaming (or discouraging) Product folks for wanting the world, or testers for finding and reporting defects, or developers for not being able to type 500 WPM. We model what's currently happening, locate where value is really getting bottlenecked, and then take a single step towards alleviating that bottleneck. Repeat.

And?

I do have some changes to suggest: Techniques that have worked for numerous teams, and work on collections of root causes that commonly plague software development.

I'm headed somewhere with this post, obviously, but I'm trying something new here (new for me):  Using a blog post like a short chapter or section, rather than having each blog post represent a complete, stand-alone article.  Maybe I'll get more written that way. ;-)

So, please "stay tuned."

01 January 2012

Five Key Ingredients of Essential Test-Driven Development

There are any number of ways to think about TDD.  For example, I use one metaphor when describing the business value of TDD with the organization's leadership, and another when describing the personal value to the members of the development team.

There are also formulations of the actual steps performed in TDD (the most brief being "Red, Green, Clean").

This list of "Key Ingredients" that follows is yet another way to think about Test-Driven Development.  I keep this list in mind when I'm training and coaching teams, to make sure they have all the fundamental tools in place in order to successfully adopt this elegant, but challenging, discipline.


Test-First

The practice:

We write a test that expresses what we want the software to do, before coding the change to the software that implements that behavior.

This practice is one of the original dozen Extreme Programming (XP) practices. It's nothing new: For as long as we've had IC chips, chip designers have written tables of required outputs based on given inputs, before designing the circuits.  Colleagues at SQE tell me they found a view-graph (aka "slide") from the mid-80's that says "Test, Then Code."

Benefits:

Analysis through experimentation: You think about what you want the software to do, and how you would know it's right, rather than jumping directly to how the code will solve a problem.  You write, in effect, a small aspiration for the software.

Better interface design: You are crafting code from the client's perspective, and you are designing the interface first, rather than designing the solution and then exposing a clunky interface that suits your implementation.

Communication of understanding: Now and in the future, you and your teammates agree on the statement of the problem you are trying to solve.

Most defects are found immediately: You know when you're done.  You know when you got it right. And the aggregate of all prior tests keep you from breaking anything you or your teammates have ever coded since the inception of the product (if you are one of the fortunate few who started with TDD).

Easier to add behavior later: You are crafting a comprehensive safety-net of fine-grained regression tests. This allows you to make seemingly radical additions to functionality very rapidly, and with an extremely high degree of confidence.  This is perhaps the biggest benefit of TDD overall. We get productivity through maintainability, and we protect the investments we've made in robust software. You can think of each test you write as a little investment in quality. By writing it first, you are investing early.


Merciless Refactoring

The practice:

Once each new microtest and all of its siblings are passing, we look for localized opportunities to clean up the design towards better readability and maintainability.  Small steps are taken between test-runs, to be sure we are not breaking functionality while reshaping code.  Both tests and production code are subject to scrutiny.

Another of the original XP practices. Again, nothing new.  Programmers have always reshaped their modules to incorporate unanticipated new requirements.  Refactoring--any change to the code to improve maintainability without altering the explicit behavior of the code--is the formalization of a professional thought-process that has been with us since Alan Turing. 

Benefits:

Emergent Design: Your design is--as James Shore so eloquently says it--reflective, rather than predictive.  You don't try to anticipate the myriad variations you may need to support in the far future. Rather, you craft the code to meet the needs expressed in the tests today (and, again, since inception of the code base).

By removing "code smells" such as duplication, you allow the real variations expressed in your product's requirements to become encapsulated on demand. When you refactor mercilessly, what emerges is often a common Design Pattern, and not always the pattern you would have predicted, had you spent your time and mental energy trying to predict the optimal solution.

Emergent Designs tend to be simpler, more elegant, more understandable, and easier to change than those you would have otherwise created in UML before coding.

Nothing is wasted: There is no extraneous functionality to maintain and debug; there is no phase where you try to get it all right before receiving feedback from the system itself.

Ease of testing: You know that good object-oriented designs result in objects that are extremely easy to test. Often you refactor towards making the next test easier to write or pass. The test-refactor cycle thus supports and improves itself: It's easier to refactor with the presence of the tests, and the next test becomes much easier to write, because it's now obvious to you which class should own the behavior.


Automation

The practice:

We write the tests using a tool that makes them immediately executable.

This one may seem obvious, but its explicit definition offsets and clarifies the others. For example, can you do test-first without automating the tests? Yes. And is there benefit? Yes.  Is it TDD? Not quite.

Programmatic microtesting has also been with us for a very long time.  In the C programming language, we used to create (and later discard) a separate little main() program with some printf() statements. Now we capture these bits of testing, and effortlessly re-execute them for as long as necessary. Since they remain separated from production code, we need not wrap them in #ifdef TESTING statements.

Benefits:

Repeatability:  You know that each time you run the tests, all tests are the same, and not open to human (mis-)interpretation.  Testers, even very talented and insightful testers, can make mistakes or succumb to biases.

Faster feedback: Computers are good at doing these dull, repetitive tasks much more quickly than humans.  Subjecting testers to repeatedly running pages of manual test suites is cruel. You help testers recoup time to spend on exploratory testing and other testing activities that provide higher returns.

One life-critical ("if you break it, a patient may die") medical product I worked on had 17,000 automated tests after two years in development.  They all ran in less than 15 minutes.  We could add functionality, fix a bug, or refactor a bit of awkward code right up to the hour of deployment, and still know we hadn't broken anything. And if we did see a single failing test, we would not commit our changes to the repository. To do so would have been unprofessional and unethical.


Mock Objects

The practice:

When creating microtests, we often test an object's response to external situations at integration boundaries (e.g., "What does my DAO do on Save() if the database is down?").

We do this by replacing the interface or class that represents the external subsystem (e.g., it makes the calls to JDBC or ADO.Net) with a "Mock" or "Fake" object, which has been instructed (aka "conditioned") by the microtest to operate in a way that suites the scenario being tested (e.g., the mock simply throws the appropriate "Database is down" exception).  When the microtest calls our object under test, the object under test interacts with the mock, just as it would in production.

Benefits:

Predictability: You use mocks to decouple the code you are writing from databases, filesystems, networks, third-party libraries, the system clock, or anything else that could alter the clear scenario you are trying to test. The test will pass or fail depending only on what your production code does within the scenario arranged by the test.

Faster tests: By decoupling from anything slower than the CPU and memory access (e.g., hard-drives, network cards, and other physical devices), you and your teammates create a regression suite that can run thousands of tests per minute.

You may have a subset of tests that actually interact with these external systems.  Think of these as "focused integration tests" (another term I think I got from James Shore).  For example, on many systems I've worked on, we've wanted to test that we could actually read from and write to database tables (prior to Hibernate, Rails, etc.).  I recall those 17,000 tests included 2% to 5% functional and focused integration tests, which took perhaps 50% of the 15 minutes.  That's okay.

Nowadays, the database itself can often run in memory.  Or, you could develop on a solid-state laptop, such as the MacBook Air or Asus Zenbook. Different paths to the same goal.

The rule of thumb is to keep the comprehensive suite from taking much longer than a "coffee break".  A "lunch break" (1/2 or 1 hour) is far too long, because then the full suite would only be executed intentionally a few times per day.  On a small Agile team, we need to see the full suite run green a dozen or more times per day.

It's this fast feedback loop that enables "Agile" teams to stay agile after months of development.


To-Do List

The Practice:

We keep a list of things to test during this TDD session immediately available, so we can add test scenarios that occur to us during the session.  We keep this particular To-Do List around only until we complete our engineering task and commit our changes.

When performing the TDD steps (briefly described as Red, Green, Clean), often we see an opportunity to test a new corner case, or refactor some bit of stale design.  These to-do items go immediately onto the To-Do List, without interrupting the flow of Red-Green-Clean.

On teams I've coached, one responsibility of the pair-programming "Navigator" is to maintain this list on a 3x5 card.  We often started with the back of the task card (Note: not the story card, because our stories were subdivided into multiple engineering tasks).

You don't have to use a 3x5 card. Some IDEs allow you to enter "TO DO" comments into the code, which then appear automatically in a "Tasks" tab.  This is especially helpful if you're not pair-programming (you unfortunate soul).

Please, just never ever ever ever ever commit a "TO DO" comment into your repository. Checking in a "TO DO" comment is effectively saying, "Well, someone will do it, but we don't have time right now." The "Tasks" tab will fill with un-prioritized to-do items that "someone" (no one) is going to do "someday" (never), and the tab will become unusable as a To-Do List.

Benefits:

Idea repository: As you write tests, implement code, and refactor, you will notice other things you want to test or refactor. Some may be rather elaborate, but all you need is a brief reminder. Once that's recorded, you can allow your creative energy to return to the test or refactoring at hand.

Memory aid: You can take a break and, when you return, you have a short check-list of items you need to complete before you're done with the task.

Pair-Programming aid: The list gives the Navigator and Driver points to discuss, decide upon, or defer, and keeps them focused on real issues rather than long philosophical debates ("Opinion-Driven Development").

Jim Shore and I refined the practice one day, after the following exchange (from memory, plus some much-needed embellishment, after 10 years):
Jim (navigating): [Writes something on the To-Do List]
Rob (driving): [I stop what I'm doing] "Whatcha adding?"
Jim: "We're going to want to test what happens when that Connection parameter is null."
Rob: "Oh? How could it end up null?"
Jim: "Well, we're getting it from StupidVendorConnectionFactory, and so..."
[long discussion ensues]
Rob: [frowning] "I forgot where we were..."
Jim: [Sighing heavily] "Rob, Rob, Rob...Just re-run the tests!"
The next time Jim thought of a missing test...
Jim (navigating): [Writes something on the To-Do List]
Rob (driving): [I stop what I'm doing] "What's that?"
Jim (navigating): [Hides the To-Do List] "I won't say. As the driver, don't worry about what I'm adding to the list.  Let's focus on the test we need to get passing right now, then we'll review the list together."
Rob: "Can I add stuff to the list?"
Jim: "Of course! Just say it as you think of it, and I'll write it down."
Rob: "Okay.  Let's add 'fix, wrap, or replace StupidVendorConnectionFactory'"
Each item on the To-Do List must either get done, or be "removed."  We designated this by drawing a little checkbox before each item. We would check those we completed, and X out those we decided were not worth doing, or that had been replaced or subsumed by another to-do.
Jim: "I feel better, now that we have ConnectionPool wrapping StupidVendorConnectionFactory."

Rob: "And since ConnectionPool has been tested, and has no paths that could possibly return null..."

Jim: "...we don't need to check for null every time we pass a Connection. I'll take it off the list." [puts an X in the box]

Rob: "Are we starting to finish each other's sentences?"

Jim:  "Creepy.  Let's talk about hiring some developers locally so I don't have to pair with you every day."