08 October 2013

Solving the Legacy Code Problem

What is Legacy Code?

In his book, Working Effectively with Legacy Code, Michael Feathers defines legacy code very simply:  Code without tests.  He isn't trying to be inflammatory, or suggest that you are a bad person because you have written legacy code (perhaps yesterday). He's drawing a clear boundary between two types of code.  James Shore (co-author of Art of Agile) has an equivalent definition:
"Code we're afraid to change."
Simply put, if it has tests that will prevent us from breaking behavior while adding new behavior or altering design, we can proceed with confidence.  If not, we feel appropriately uneasy.

Of course, if you plan to never release a new version of your code, you won't need tests.  In my almost 40 years of programming, I've not seen that happen even once.  Code needs to change. Ergo, code needs tests.

Build Your Team's Safety Net

Your team may want to adopt this metaphor: Think of your whole suite of automated regression tests as a safety-net that the team builds and maintains to keep themselves from damaging all prior investment in behavior.

If it takes two hours to run them all, you'll run them once per day, and if they catch something, you know that someone on the team broke something within the last 24 hours.  If it takes one hour, you'll run them twice per day (once at lunch) and you've narrowed down the time by half.  That's probably better than 80% of the teams in the world, but it can be even better. 

I'll give you a real-world example: I worked on a life-critical application in 2002.  After two years of development, this product had a comprehensive suite of 17,000 tests.  They ran in less than 15 minutes.  That team often took on new developers, and we gave them this simple message:  "You break something, someone may die.  But you have this safety net.  You can make any change you believe is appropriate, but don't commit the change until you run the tests." In the time it took to walk to the cafe and buy a latte, I would know whether or not I was making a mistake that could cause someone to die.

We made changes up to a day before going live.

It can be that good.  Of course, it takes effort to "pay down" the legacy code debt (and a lot of mock/fake objects…another topic for another day.)  But the longer you wait, the worse the debt becomes.

Characterization Tests

The product mentioned above was developed from the ground up with unit tests written by a team who embraced unit-test-level Test-Driven Development (TDD).  Nice work if you can get it.  The rest of the world faces legacy code debt. 

You don't have to pay it all down before you proceed. In fact, you mustn't.  You have to be thoughtful about selecting high-risk areas:  An area of code that breaks frequently, or is changed frequently, should first be "covered" with "characterization" tests.

"Characterization test" is not defined by any particular type of tool. We often use the unit-testing framework, but we're not limited to it.

Like unit-tests, these tests must be deterministic, independent, and automated.  Unlike unit-tests, we want to "cover" the most amount of system behavior with the fewest number of tests and the least effort.  When you write these tests, you are not bug-hunting, but rather "characterizing" or locking down existing behavior, for better or worse.  It's tempting to fix production bugs as you go, but fixing a bug that's escaped into the wild could introduce another bug, or break a hidden "feature" that a customer has come to rely on.  It's fine to note or log the bug, but your characterization test should pass when the bug manifests itself.  Name the test with a bug description or ticket number, so the team can easily find it later.

Why not fix the production defect? Because the point of creating this safety net is to give you the freedom to refactor. You may be refactoring so you can add new behavior more easily, or even so you can fix a bug more easily later, but refactoring and adding behavior are two distinct activities. Using TDD, they are two separate steps in the cycle.  (Aside:  Fixing a bug is effectively adding new behavior, because the system wasn't actually behaving the way we expected. You can use TDD for that.)

The unit-testing framework and developer IDE usually gives us the most flexibility, plus the ability to mock dependencies and use built-in refactorings for safety. But in order to lock down large swaths of behavior, teams should think creatively. I've worked with teams who compared whole HTML reports, JPEG images, or database tables; or who have rerouted standard input and output streams. The nature of the product and the size of the mess may dictate the best approach.

And don't aim for a duration target, e.g., "15 minute test runs." Teams sometimes respond to arbitrary targets by sabotaging their own future in order to make the numbers.  For example, deleting existing tests! Rather, aim for improvement by looking for the greatest delay in testing.  Weigh a "huge refactoring" of the persistence layer against using an in-memory database.  There is no in-memory version of your database software?  Use a solid-state drive. Developers are naturally creative problem-solvers, particularly when they collaborate.

Resistance is Futile

Code written without tests often resists testing. When you write unit-tests test-driven, they tend to be very tiny, compact, isolated, and simple (once you get the hang of it). It's actually easier and faster to write them with the code using TDD, even though you end up with more of them.  Interestingly, if you write your unit-tests after the code has been written, you are really writing characterization tests: They're harder to write, they're often a compromise that tests a number of behaviors, and they often give you the bad news that you made a mistake while coding. This is why most developers hate writing "unit-tests" (me, included). We were doing it backwards.

That may make writing characterization tests seem unbearably painful, but it's really not.  Once you collect a handful of simple, "surgical refactorings" for creating testable entry-points into your behaviors, the legacy code problem becomes a bit of an archeological expedition: Find the important behaviors, carefully expose them, then cover them with a protective tarp.  It can be rewarding all by itself. But the big payoff comes later, when it's time to change something.

02 January 2013

The Sportscar Metaphor: TDD, ATDD, and BDD Explained


Your Mission, Should You Accept...

You've been tasked with building a sports car.  Not just any sports car, but the Ultimate Driving Machine.

The Ultimate Driving Machine

Let's take a look at how an Agile team might handle this...

Acceptance Test Driven Development

What would a customer want from this car?  Excitement! And perhaps a degree of safety.  Let's create a few user stories or acceptance criteria for this (the line between those two will remain blurred for this post):
  • When I punch the accelerator, I'm pushed back into my comfortable seat with satisfactory acceleration.
  • When I slam on the brakes, the car stops quickly, without skidding, spinning, or flipping, and drivers behind me are warned of the hard braking.
  • When I turn a sharp corner, the car turns without rocking like a boat, throwing me against the door, skidding, spinning, or making a lot of silly tire-squealing noises.
These are good sample acceptance criteria for the BMW driving experience.  We can write these independently of having a functioning car to test. That's what makes this "Test Driven" from an Agile perspective:  The clear, repeatable, and small-grained tests, or specifications, come before we would expect them to pass.  This is fairly natural, if you consider each small collection of new tests to be Just-In-Time analysis of a user story. That's "Acceptance Test Driven Development," or ATDD, in a nutshell.

In order for us to write truly clear, repeatable "acceptance tests" for a BMW, we would need to get much more specific about what we mean by "punch", "satisfactory", "slam", "sharp". In the software world, this would involve the whole team: particularly QA/Test and Product/BA/UX, but with representation from Development to be sure no one expects Warp Drive. The team discusses each acceptance criterion to determine realistic measurements for each vague, subjective word or phrase.


What levels of fast, quick, fun, exciting, and safe are acceptable? What tests can we run to quickly assess whether or not our new car is ready for a demo? How will we know we have these features of the car fully completed, with acceptable levels of quality, so that we don't have to return to them and re-engineer them time and time again?

Once an acceptance test passes (and, on a Scrum team, once the demo has been completed and the stories accepted by the Product Owner), they become part of the regression suite that prevents us from ever allowing these "Ultimate Driving Machine" qualities from degrading.

Test-Driven Development 

Now the engineers start to build features into the car.  A quick architectural conversation at the whiteboard identifies the impact upon various subsystems, such as chassis, engine, transmission, environmental/comfort controls, safety features.

What would some unit tests (aka "microtests") look like?  Perhaps these would be examples (keep in mind that I'm a BMW customer, not a BMW engineer, and have little idea of what I'm talking about):
  • When the piston reaches a certain height, the spark plug fires.
  • When the brake pedal is pressed 75% of the way to the floor, the extra-bright in-your-face LED brake lights are activated.
  • When braking, and a wheel notices a lack of traction, it signals the Anti-Lock Braking system.
See the difference in focus?  Acceptance Tests are business-facing as well as team-guiding.  Microtests are tools that developers use to move towards completion of the Acceptance Tests.

I used to own a BMW. I couldn't do much to maintain it myself, except check the oil.  I would lift the hood, and admire the shiny engine, noting wistfully that cars no longer have carburetors, and I will probably never again perform my own car's tune-up.

Much of what makes a great car great is literally under the hood.  Out of sight. Conceptually inaccessible to Customers, Product Managers, Marketers...even most Test-Drivers. What makes the Ultimate Driving Machine work so well is found in the domain of the expert and experienced Engineer.

In the same way, unit tests are of, by, and for Software Developers.

What's the Difference?

In both cases, we write the tests before we write the solution code that makes the tests pass.  Though they look the same on the surface, and have similar names, they are not replacements for each other.

For TDD:
  • Each test pins down technical behavior.
  • Written by developers.
  • Intended for an audience of developers.
  • Run frequently by the team.
  • All tests pass 100% before commit and at integration.
  • Each test pins down a business rule or behavior.
  • Written by the team.
  • Intended for the whole team as audience.
  • Run frequently by the team.
  • New tests fail until the story is done.  Prior tests should all pass.
Which practice, ATDD or TDD, should your team use? Your answer is embedded in this Sportscar Metaphor.*

Behavior Driven Development

For a long time no one could clearly express what "Behavior Driven Development" or BDD was all about. Dan North coined the term to try to describe TDD in a way that expressed what Ward Cunningham really meant when he said that TDD wasn't a testing technique.

Multiple coaches in the past (me, included) have said that BDD was "TDD done right." This is unnecessarily narrow, and potentially insulting to folks who have already been doing it right for years, and calling it TDD.  Simply because many people join Kung Fu classes and spend many months doing the forms poorly doesn't mean we need to rename Kung Fu. (Nor should we say that "Martial Arts" captures the uniqueness of Kung Fu.)  

I witnessed a pair of courageous young developers who offered to provide a demo of BDD for a meetup.  They used rspec to write Ruby code test-first.  They didn't refactor away their magic numbers or other stink before moving on to other randomly-chosen functionality. "This can't be BDD," I thought, "because BDD is TDD done well."

TDD is TDD done well.  Nothing worth doing is worth doing incorrectly.  I had been using TDD to test, code, and design elegant software behaviors since 1998. I wanted to know what BDD adds to the craft of writing great software.

I can say with certainty that I'm a big fan of BDD, but I'm still not satisfied with any of the definitions (and I'm okay with that, since defining something usually ruins it).  A first-order approximation might be "BDD is the union of ATDD and TDD."  This still seems to be missing something subtle. Or, perhaps there is so much overlap that people will come up with their own myriad pointless distinctions.

However we try to define it in relation to TDD, BDD's value is in the attention, conversations, and analysis it brings to bear on software behaviors.

In hindsight, I have already seen a beautiful demo, by Elisabeth Hendrickson, of TDD, ATDD, and (presumably the spirit of) BDD techniques combined into one whole Agile engineering discipline.

She played all roles (Product, Quality, Development) on the Entaggle.com product, and walked us through the development and testing of a real user story.  She broke the story down into a small set of example scenarios, or Acceptance Tests. She wrote these in Cucumber, and showed us that they failed appropriately.  She then proceeded to develop individual pieces of the solution using TDD with rspec.

Then, once all the rspecs and "Cukes" were passing, she did a brief exploratory testing session (which, by definition, requires an intelligent and well-trained human mind, and thus cannot be automated). And she found a defect!  She added a new Cuke, and a new microtest, for the defect; got all tests to pass; and demonstrated the fully functioning user story for us.

All that without rehearsal, and all within about 45 minutes.  Beautiful!

* I have a draft post that further describes, compares, and contrasts the detailed practices that make up ATDD and TDD, along with a little historical perspective on the origins of each. For today, I wanted to share just the Sportscar Metaphor. It's useful for outlining which xDD practices to use, and how they differ.