Sunday, July 28, 2013

Technical Debt Strategies

For a new role I’m starting at work, I’ve been thinking about the concept that we in the software development business refer to as “techincal debt.”

In software development, there is always a tension between two opposing forces: the desire to do it fast, and the desire to do it right.  I could probably write an entire blog post on just that topic, but for now I’ll settle for the short version.  If you do it right, then, later, when you want to extend it, or modify it, or use it as a jumping off point for branching out in a whole new direction (and you will always want to do these things eventually, if your software lives long enough), you can do so easily, with a solid foundation as a base.  The downside is that it will take longer.  If you do it fast, you get results faster, which means you can serve a customer’s needs before they change, fill a window of opportunity before it closes, or perhaps even beat your competitors to the market with your offering.  But when you have to modify it later (which you will), it will end up taking even more time to clean things up than if you’d just done it right in the first place.

You can see why we often call this “technical debt.”  You’re saving time now, but you’ll have to “pay it back” later, and the amount of extra time it takes is like the interest.  Primarily, we software people invented this analogy because it makes good sense to business people.  When you’re running a business, sometimes you need a piece of equipment.  You have two choices: you can borrow some money and buy it now, or you can save up and purchase it outright, later.  Buying it now allows you use it now, thus saving you costs somewhere else or giving you new capabilities you can capitalize on.  But you’re going to have to pay for it eventually, and you’ll have to pay the interest too.  Buying it later saves money in the long run, but denies you the advantages of the equipment for however long it takes to save up enough money.

This analogy really works because, despite what some people tend to believe, neither choice is always right or always wrong.  Sometimes it makes sense to borrow the money and buy it now; sometimes it makes sense to wait.  Likewise with software, sometimes it makes sense to just get the damn thing programmed as quickly as possible, and sometimes it makes sense to take the time and do it right the first time.  And business people have to make that call, and expressing the choice in financial terms makes it easy for them to understand the trade-offs.

For instance, when a company is first starting up, “do it fast” is nearly always the right answer.  (David Golden has a great post which explores this in greater detail.)  Basically, there’s no point in worrying about how hard it will be to change your software tomorrow when you’re not even sure there’s going to be a tomorrow.  Once a company leaves the startup phase, though, it’s time to think about paying down some of that technical debt.  If it doesn’t, the lack of agility means it may not be able to respond to customer demands quickly enough to thrive in the marketplace.

Now, technical debt is in many ways unique to every individual company, and there’s no one technique or approach that will work for everyone.  But everyone faces the same basic problem: I need my development team to keep making new features while they’re also paying off this debt; how can I have them accomplish even more than they were doing before without burning out, or creating new problems as fast as they fix the existing ones?  Hiring more people helps, of course, but it isn’t the complete solution.  But there are some general strategies that will work, and can be used by nearly every company that finds itself in the position of having a fair amount of technical debt to pay off.  These five areas nearly always need some attention, and they’re excellent things to think about when looking for ways to make your programmers more efficient, which not only gives them time to work on refactoring messy areas of your codebase, but also gives them more time and confidence to avoid creating technical debt in the first place.

Unit Tests  Few companies have no unit tests, but few have enough either.  I’ve heard people worry about creating too many unit tests, but honestly this is like your average American worrying about eating too many vegetables: I suppose it’s technically possible, but it’s unlikely to be a problem you’ll ever face.  It is possible for your unit tests to be too messy, though.  Remember: your unit tests are supposed to give you confidence.  The confidence you need to get in there and clean up some of that technical debt without being afraid you’re going to break something.  But your tests can’t give you confidence unless you’re confident in your tests.

Now, many people will tell you that you should put as much time and effort into your tests as you do your code (this is often—but not always—what people mean when they say “tests are code”).  I don’t necessarily agree with this.  I think that all the effort you put into your unit tests should go towards making your unit tests effortless.  I think your code should be elegant and concise: you don’t want it too simple, because it’s a short hop from “simple” to “simplistic.”  Imagine the Dr. Suess version of War and Peace and how hard that would be to get through—that’s what I’m talking about.  You need to use appropriate levels of abstraction, and using too few can be as bad as using too many.  You have to strike a balance.

On the other hand, your tests should be simple.  Very simple.  The simpler they are, the more likely your developers are to write them, and that’s crucial.  Whether you’re using TDD or not (but especially if you’re not), devs don’t particularly like writing tests, and anything that gives them an excuse not to is undesireable.  You want your tests to read like baby code, because that way not only will people not have to expend any brainpower to write new tests, but reading them (and therefore maintaining them) becomes trivial.  This doesn’t work as well for your code because of the crucial difference between code and tests:  Code often needs to be refactored, modified, and extended.  Tests rarely do—you just add new ones.  The only times you modify tests is when they start failing because of a feature change (as opposed to because you created a bug), and in many cases you’re just going to eliminate that test.  And the only times you should be refactoring them is when you’re working on making them even simpler than they already are.

So, in addition to trying to extend your test coverage to the point where refactoring becomes feasible, you should probably work on creating underlying structures to make writing new tests simple, and to make reading existing tests even easier.

Code Reviews  There are still plenty of companies out there not doing this one at all, so, if you fall in that category, you know where to start.  But even if you’re already doing them, you could probably be doing them more efficiently.  Code reviews are tough to implement well, and it has as much to do with the process you implement as with the software you choose to utilize.  Of course, it has a lot to do with the software you choose to utilize.  Code reviews, like unit tests, need to be simple to do, or people will find excuses not to do them.  Good software can really help with that.

Hopefully, like unit testing, code reviews are something that you don’t really need to be convinced that you should be doing.  But more people are resistant to code reviews than any other practice mentioned here.  They absolutely do require an investment of time and effort, and they absolutely do mean that you will deliver some features more slowly than you could without them.  Of course, if that extra time and effort catches a crucial customer-facing bug, you won’t be complaining.

But, honestly, catching bugs is not the primary benefit of code reviews ... that’s just the icing on the cake.  Code reviews just produce better software.  They’re the time and place where one developer on your team says to another: “hey, you know you could have used our new library here, right?”  (To which the response is inevitably: “we have a new libary?”)  They’re fantastic for cross-training too: don’t think code has to be reviewed by someone who already knows that code.  Have it reviewed by someone who’s never seen the code before and they’ll learn something.  Have your junior people reviewed by your senior people, and have your senior people reviewed by your junior people: they’ll all learn something.  It makes your codebase more consistent and builds a team dynamic.  It diffuses business knowledge and discourages silo effects.  If it happens to catch a bug now and then ... that’s just gravy.

Once you have code reviews in place, though, you’ll probably still want to tweak the process to fit your particular organization.  If you require reviews to be complete before code can be deployed, then code reviews are a blocking process, which comes with its own hassles.  Then again, if you allow deployment with reviews still outstanding, then you risk making the review process completely ineffectual.  So there’s always work to be done here.

Code Deployment  I suppose there are still software folks out there for whom “deploy” means “create a new shrink-wrap package,” but more and more it means “push to the website.”  At one company I worked at, deployment was a two-hour affair, which had to be babysat the entire time in case some step failed, and, if a critical bug was discovered post-launch, rolling back was even more difficult.  From talking with developers at other companies, I don’t believe this was an exception.

If your organization is big enough to have a person whose entire job is dedicated to deployment (typically called a “release manager”), then you don’t have to worry about this as much.  If that two hours is being spent at the end of a long day of preparation by one of your top developers, you’ve got some room for improvement.  But even if you have a seprate release manager, think how much you have to gain as your deployment gets shorter and simpler and more automated.  You can release more often, first of all, and that’s always a good thing.  And there are simple strategies for making rolling back a release almost trivial, and that’s absolutely a good thing.

Now, I’ve never been a fan of “continuous deployment” (by which I mean deployment happening automatically, triggered by checking in code or merging it).  But I do believe that you should strive for being able to deploy at any time, even mere minutes after you just finished deploying.  Part of achieving this goal is reaching continuous integration, which I am a fan of.  Having those unit tests you worked on so hard being run automatically, all the time, is a fantastic thing, which can free your developers from the tedious chore of doing it manually.  Plus, continuous integration, unlike code reviews, really does catch bugs on a regular basis.

Configuration Management  Similar to deployment, and for the same reasons.  If building a new production server is a heinous chore, you’re not going to want to do it often, and that’s bad.  Horizontal scaling is the simplest way to respond to spikes in traffic, and you should be able to bring a new server online practically at the drop of a hat.  It should be completely automated, involve minimal work from your sysadmins, and zero work from your developers.  There are all kinds of tools out there to help with this—Puppet, Chef, etc—and you should be using one.  Using virts goes a long way towards achieving this goal too.

For that matter, it should be trivial to bring up a new development server, or a new QA server.  Once that ceases to be a big deal, you’ll find all sorts of new avenues opening up.  Your developers and/or your QA engineers will start thinking of all sorts of new testing (particularly performance or scalability testing) that they’d never even considered before.  Figuring out how to handle database environments for arbitrary new servers can be a challenge for some shops, but it’s a challenge worth solving.

For Perl in particular, you also need to be able to reproduce environments.  You can’t spin up a new production web server alongside your existing ones if it’s going to be different than the rest.  If your server build process involves cloning a master virt, you don’t have to worry about this.  If you’re rebuilding a server by reinstalling Perl modules, you must expect that CPAN will have changed since you last built production servers, because CPAN is always changing.  For this reason, a tool like Pinto (or Carton, or CPAN::Mini, or something) is essential.

Version Control  In this day and age, I doubt seriously I need to convince anyone that they should start using version control ... although one does hear horror stories.  More likely though, you should be thinking about what VCS you’re using.  If you’re using CVS, you should probably be using Subversion, and, if you’re using Subversion, you should probably be using Git.  But along with thinking about which software to use and how to switch to a better one (preferably without losing all that great history you’ve built up), you also need to be thinking about how you’re using your VCS.

Are you using branches?  Should you be?  If you’re not, is it only because of the particular VCS you’re using?  If you’re using CVS or Subversion and telling me branching is a terrible idea, I can understand your point of view, but that doesn’t mean you shouldn’t be using branches—it just means you should probably be using Git.  Now, granted, branches aren’t always the right answer.  But they’re often the right answer, and they allow experimentation by developers in a way that fosters innovation and efficiency.  But it takes work.  You have to come up with a way to use branches effectively, and not just your core developers should agree.  If you have separate front-end developers, for instance, they’re going to be impacted.  If you have a separate QA department, they’re definitely going to be impacted—in fact, they can often benefit the most from a good branching strategy.  Your sysadmins, or your ops team (or both), may also be affected.  If you’ve tried branches before and they didn’t work, maybe it wasn’t branching in general that’s to blame, and maybe it wasn’t even the particular VCS ... maybe it was the strategy you chose.  Feature branches, ticket branches, developer branches—again, there’s no one right answer.  Different strategies will work for different organizations.

So these are some of the things you need to be considering as you look to reduce your technical debt.  None of these things address the debt directly, of course.  But, in my experience, they all play a part in streamlining a development team by increasing efficiency and cohesiveness.  And that’s what you really need in order to start paying down that debt.


  1. I am surprised you left off "Writing proper documentation" - I find that there are a TON of systems that serious in deficit due to poor/bad/out-dated documentation.

    As someone who's new to the system comes onboard you spend enormous amounts of time/effort getting "up to speed" due to the crappy docs.

  2. In the trade-off between speed and unit testing, I think there is a middle ground: creating testable code. The team can move fast by not spending too much time on creating tests, but they create enough tests to demonstrate testability.

    Many times, in the early stages, we don't know if customers even want this code, so why invest in high levels of test coverage. However, if its successful, we have to maintain for the long term, and unit tests become useful.

    How would you think about creating testable code, but not actually creating the tests? What guidelines would you use for testable code?

  3. This definition of technical debt seems wrong to me. Technical debt in my experience happens more often in organisations who do the opposite of what you say. Technical debt accumulates for every week that passes WITHOUT customer validation and feedback. The point about releasing early and often is to prevent the build up of huge amounts of invalid code and assumptions.

    Your definition of technical debt is actually a definition of bad coding practices. Write quickly and iteratively without the right toolset at your peril. Get your toolset right on the other hand (eg: using TDD, BDD, CI, CD etc) and you can develop quickly, iteratively but with stability and longevity.

    If you take your time, ponder your navel and not build fast you will a) miss the opportunity b) build invalid code based on assumptions not on valid customer feedback (which will require massive refactoring) c) not be moving quickly enough to keep apace with technology changes (and so end up in a technology refactor nightmare and d) change your processes so many times before releasing anything and end up in a process refactoring nightmare.

    Get a fluid, robust toolset and a procedure for developing and deploying quickly and iteratively using best practices and you're onto a winner.

  4. Right now I'm working in the software division at a medical device company, so unit testing and version control are taken very, very seriously. Our unit testing is comprehensive to the spec and formalized and probably more extensive than 99% of software outfits. That said, there's still poorly written tests, or (despite the fact that the test-writers are not the devs) tests that test the code as it was at a certain time as opposed to what the spec says it should be, or cases where the spec gets re-written in weird ways to accommodate the code. At times, we might be that rare fella who just eats too many vegetables.

    I definitely identify with the "technical debt" concept; I think I need to start throwing that around. We are spending a large part of our technical budget on debt service these days.