08 October 2013

Solving the Legacy Code Problem

What is Legacy Code?


In his book, Working Effectively with Legacy Code, Michael Feathers defines legacy code very simply:  Code without tests.  He isn't trying to be inflammatory, or suggest that you are a bad person because you have written legacy code (perhaps yesterday). He's drawing a clear boundary between two types of code.  James Shore (co-author of Art of Agile) has an equivalent definition:
"Code we're afraid to change."
Simply put, if it has tests that will prevent us from breaking behavior while adding new behavior or altering design, we can proceed with confidence.  If not, we feel appropriately uneasy.

Of course, if you plan to never release a new version of your code, you won't need tests.  In my almost 40 years of programming, I've not seen that happen even once.  Code needs to change. Ergo, code needs tests.

Build Your Team's Safety Net


Your team may want to adopt this metaphor: Think of your whole suite of automated regression tests as a safety-net that the team builds and maintains to keep themselves from damaging all prior investment in behavior.

If it takes two hours to run them all, you'll run them once per day, and if they catch something, you know that someone on the team broke something within the last 24 hours.  If it takes one hour, you'll run them twice per day (once at lunch) and you've narrowed down the time by half.  That's probably better than 80% of the teams in the world, but it can be even better. 

I'll give you a real-world example: I worked on a life-critical application in 2002.  After two years of development, this product had a comprehensive suite of 17,000 tests.  They ran in less than 15 minutes.  That team often took on new developers, and we gave them this simple message:  "You break something, someone may die.  But you have this safety net.  You can make any change you believe is appropriate, but don't commit the change until you run the tests." In the time it took to walk to the cafe and buy a latte, I would know whether or not I was making a mistake that could cause someone to die.

We made changes up to a day before going live.

It can be that good.  Of course, it takes effort to "pay down" the legacy code debt (and a lot of mock/fake objects…another topic for another day.)  But the longer you wait, the worse the debt becomes.

Characterization Tests


The product mentioned above was developed from the ground up with unit tests written by a team who embraced unit-test-level Test-Driven Development (TDD).  Nice work if you can get it.  The rest of the world faces legacy code debt. 

You don't have to pay it all down before you proceed. In fact, you mustn't.  You have to be thoughtful about selecting high-risk areas:  An area of code that breaks frequently, or is changed frequently, should first be "covered" with "characterization" tests.

"Characterization test" is not defined by any particular type of tool. We often use the unit-testing framework, but we're not limited to it.

Like unit-tests, these tests must be deterministic, independent, and automated.  Unlike unit-tests, we want to "cover" the most amount of system behavior with the fewest number of tests and the least effort.  When you write these tests, you are not bug-hunting, but rather "characterizing" or locking down existing behavior, for better or worse.  It's tempting to fix production bugs as you go, but fixing a bug that's escaped into the wild could introduce another bug, or break a hidden "feature" that a customer has come to rely on.  It's fine to note or log the bug, but your characterization test should pass when the bug manifests itself.  Name the test with a bug description or ticket number, so the team can easily find it later.

Why not fix the production defect? Because the point of creating this safety net is to give you the freedom to refactor. You may be refactoring so you can add new behavior more easily, or even so you can fix a bug more easily later, but refactoring and adding behavior are two distinct activities. Using TDD, they are two separate steps in the cycle.  (Aside:  Fixing a bug is effectively adding new behavior, because the system wasn't actually behaving the way we expected. You can use TDD for that.)

The unit-testing framework and developer IDE usually gives us the most flexibility, plus the ability to mock dependencies and use built-in refactorings for safety. But in order to lock down large swaths of behavior, teams should think creatively. I've worked with teams who compared whole HTML reports, JPEG images, or database tables; or who have rerouted standard input and output streams. The nature of the product and the size of the mess may dictate the best approach.

And don't aim for a duration target, e.g., "15 minute test runs." Teams sometimes respond to arbitrary targets by sabotaging their own future in order to make the numbers.  For example, deleting existing tests! Rather, aim for improvement by looking for the greatest delay in testing.  Weigh a "huge refactoring" of the persistence layer against using an in-memory database.  There is no in-memory version of your database software?  Use a solid-state drive. Developers are naturally creative problem-solvers, particularly when they collaborate.

Resistance is Futile


Code written without tests often resists testing. When you write unit-tests test-driven, they tend to be very tiny, compact, isolated, and simple (once you get the hang of it). It's actually easier and faster to write them with the code using TDD, even though you end up with more of them.  Interestingly, if you write your unit-tests after the code has been written, you are really writing characterization tests: They're harder to write, they're often a compromise that tests a number of behaviors, and they often give you the bad news that you made a mistake while coding. This is why most developers hate writing "unit-tests" (me, included). We were doing it backwards.

That may make writing characterization tests seem unbearably painful, but it's really not.  Once you collect a handful of simple, "surgical refactorings" for creating testable entry-points into your behaviors, the legacy code problem becomes a bit of an archeological expedition: Find the important behaviors, carefully expose them, then cover them with a protective tarp.  It can be rewarding all by itself. But the big payoff comes later, when it's time to change something.