Saturday, September 12, 2015

Broken Windows and How to Fix Them

It's your first day on the job at a new client and you're excited and ready to get started.  They've got some cool problems they are trying to solve, the tech stack sounds interesting, and you've got work lined up for your team.  You get there and immediately notice that the build board is red.  Not just a little red, but bleeding red.  "Maybe it's just a single bad build" you think to yourself. "I'm sure it will get fixed right away."  

You try not to about it too much while you get settled in and start pulling down the code to build it locally.  It compiles, but there are a number of failing tests.  You notice a number of other details about the code; domain logic is spread throughout, there is bleed-over between the different layers of abstraction, and A LOT of untested code.  Code duplication abounds, and some of the tests are flaky.
You check the build board again, and notice that not only is it still red, but someone has pushed on top with more changes.  When you ask one of the other developers about it they chuckle and reply that it has been red for a while, but it's OK, it's just some of the tests.  No one seems too bothered by this.  You start wondering if you've walked into some parallel universe where a red build is acceptable, or even expected.  How did it get this way? Doesn't this bother anyone? You realize you have your work cut out for you.

Broken Window Theory

So what happened?  How did the state of the project get to this point?  How can a build get into a red state and stay that way?  Unfortunately it is entirely too easy for this to happen.  It only takes one bad changeset and zero developers who care about fixing it.  This happens in software development with maddening frequency, and is an example of the Broken Window Theory.

Broken Window Theory goes like this:

Given a building with one or more broken windows which are not quickly repaired, the tendency is for more windows to be broken and for other acts of vandalism to occur.  People notice that no one cares about the building, and there is no social pressure to prevent the vandalism.  The building quickly falls into disrepair and stays that way.

In software development lots of little changes can contribute to creating an environment which tolerates a continually broken build.  The addition of code which is difficult to test leads to fewer tests. Flaky tests can degrade confidence in the build ("Oh yeah, that build is red because of known flaky test, go ahead and push anyway!"). Just plain broken functionality is allowed to be pushed, and a general lack of design and maintenance can cause the codebase to be become chaotic and difficult to work with.  Maybe there are developers on the team who simply don't know how to write tests, or how to refactor properly.

All of this together can make it disheartening to try to do the right thing, it becomes easy to fall into the trap of "Well, this is just how the codebase is, we should just learn how to live with it." Obviously this is not the answer.

Fixing Broken Windows

So the windows have been broken and the build is red.  How do you fix the situation?  How do you get the build back to a green state and keep it that way?  The answer is simple, if not easy: get the team to care about the build.  Obviously this is easier said than done in many cases.  Often the team has been beaten down by the long-standing degradation of the build, so simply shouting about the problems in the codebase won't solve anything.  Concrete steps must be taken.

Stop Further Damage

To begin with, cordon off the building, draw a line in the sand, stop the bleeding (pick your metaphor here, there are a lot of them), but whatever you do stop the code from getting any worse than it already is.

Take your lead from this guy.

Be prepared to put the breaks on and play the bad guy, because you're about the rock the boat and upset a lot of people who have become comfortable with The Way Things Are.

Your first order of business should be to get the build back to green as quickly as possible.  If that means no one pushes code until that happens then so be it.  Implement an Evergreen Policy which states that if a build goes red it is either fixed immediately, or the changeset is backed out.

The build is an indication of the health of the project, and ultimately it should tell you whether you are ready to deliver or not, so it should become a priority to get it green and keep it green.  The code should compile and all tests should pass every single time.

Make "Done" Include a Healthy Build

For a given unit of work (user story, task, whatever you want to call it) there should be a Definition of Done.  This definition should describe the requirements to be met before that unit of work can be considered complete, and the next unit of work is begun.  Whatever those requirements currently are, they should be updated to include a clean bill of health for the build.  Appropriate tests should be added and the entire build should complete successfully.  Nobody gets to pick up another piece of work until this happens.  Refuse to allow breaking changes into the build.

Increase Confidence

By now you've hopefully gotten the build green, and laid out a plan to keep it green.  Even so, there may be tests which you can't always trust.  These tests might flap occasionally, leaving developers unsure if they've broken something or not.  If you can't trust the build you can't be sure of the quality of the software you're building.  Investigate the root cause of the flapping tests and address it as quickly as possible.  In the meantime take steps to fix the tests themselves as well.  Maybe they need to be rewritten, or moved into another, more stable layer.  Maybe there are timing issues which can solved by increasing the timeout values.  Maybe they should just be eliminated, as flapping tests are useless in terms of confidence in the build. Whatever the case, do whatever you have to create a reliable and consistent build.

Spread the Pain

Don't try to take on the world on your own. Instead, enlist everyone else to your cause.


Make everyone responsible for keeping the build green.  Chances are good you're going to need backing from the technical leadership to get everyone in line, as behavior doesn't change overnight and some developers will need an incentive to change.  Maybe this means a rotating responsibility in the beginning, but the goal is to make everyone responsible for the entire codebase, and at the very least responsible for the code they are pushing.

Educate and Train

A major cause of headache-inducing codebases is that many developers simply don't know how to do better than they already are.  They may not know how to properly unit test, or maybe they haven't been introduced to Test-Driven Development before.  Maybe they are simply inexperienced and need to be educated on common engineering practices and principles, such as SOLID, DRY, and the concepts of clean code.  Take this opportunity to help everyone step up their game.  Start holding a regular code club to practice these principles away from the production code.  Start a programming book club and encourage everyone to participate.  Do something to help everyone improve, as this will pay out for everyone, both in the short term and in the long term.

Communicate

It's amazing how often problems can be solved by simply talking about them.  Encourage the team to communicate about the issues they're having, especially when it concerns a broken build.  Often, simply acknowledging that the build is broken is enough to start a conversation about how to fix it.  Also encourage communication even when the build is not broken; make it a point to regularly get together and talk about what could be improved, both within the code and without.  It doesn't have to be an hour long meeting, it could just as easily be 15 minutes or less, which is enough time to get the team thinking about the problem at hand.

Give It Time

Change doesn't happen immediately, but if you keep at it and keep everyone on their toes it will happen sooner or later.  Eventually you'll notice a difference; instead of the build being red and people just shrugging it off you'll start overhearing conversations about how to get it fixed and fast.  Just as people got used to the build being broken all the time they'll become used to the build being green all the time.  They won't tolerate broken windows, and they'll make an effort to fix them as soon as they happen.



No comments:

Post a Comment