Cleaning up

September 04, 2008 - approaches

We all have different ideas on when we should clean things up. It could be a dirty room, cluttered desk, messy closet, or just too much stuff lying around. We also prefer to keep things a certain way, which can be very individualistic. From the outside a desk with tons of paper lying around could look like it could use some rearranging, but the desk's owner might be able to find anything at a moment's notice.

Given how different we all are with tolerating visual clutter, it's not too surprising that there's a wide range of opinions on when we should clean up our code. Many books have been written completely focused on this very issue, and in the end, I think it's still much more of an art than a science. In fact, I believe programming itself is much more of an art than a science, but that's the topic for a whole other discussion.

Then there's a time for spring cleaning too. And we're always surprised when we find some really dirty stuff in the nooks and crannys. But in the real world, we usually do end up cleaning it up. (That is, if you're not as lazy as I am). In the virtual world however, some cleanup tasks are truly daunting, and it may be easier to leave the dirty things the way they are. After all, it might not look pretty, but it works.

And similar to how the owner of a cluttered desk can find something for us quickly, old dirty code tends to work predictably as well. But if we change it a bit, we can no longer guarantee that it'll work the same way. Unit tests can help here, but we still can't make any guarantees. And when you move an old couch from one corner of the room to the other, you realize that there was a whole lot more dust under there than you thought. That dust also has to get cleaned up. And after some time cleaning up, we ask ourselves, was it really worth it? That couch really wasn't all that bad where it was.

But, the advantages gained by cleaning up can outweigh its costs and risks. After all, if something becomes simpler and easier to understand, we stand to gain every single time someone works with it. Add up the time for all these future interactions, and the refactoring can more than pay for itself.

However, like weatherman and traders already know, it's hard to predict the future. (They'll never admit it though. What I think these guys really excel at is coming up with excuses. :) ) But, it's very easy to predict the past. And although we may pay a small price each time we end up working around the dirt, we start learning where all the dirty areas are. This puts us in a better position to clean up more effectively in the future.

Refactoring usually alters the abstractions used. A new layer could get added that simplifies complex interactions by handling those details. Or extra layers that get in the way are removed to produce simpler code. But if a new problem comes along that doesn't fit into these new abstractions, then the game is up. Chances are we'll put in a hack to work around that "edge case", and those can start to pile up, especially if other programmers start putting their hands in the mix. And before you know it, we'll be talking about a new refactoring.

I'm not arguing that we should never clean up code though. I'm just pointing out that there are often more factors involved when thinking about cleaning up than just making the code prettier. What cleaning up does seem to do is improve morale. Most programmers would much rather write a 1000 new lines of clean, sparkling code, rather than try to figure out the ten lines in a 10000 line messy codebase that need to be modified. And when bold, noble undertakings like these are launched, I start to feel like Braveheart just gave me a speech before an impossible battle.

And what usually ends up happening is that the small splinter cell team beats the odds. But why? There's too much work and not enough time. Well, I think that the answer is simple: the programmers work harder. And the reason for that is that they are a whole lot more motivated with the prospect of a fresh start instead of trying to keep a big ball of mud together.

In the end, major refactorings are a tough call either way. And the technical issues might not be the whole story. Social issues can also play a role in the decision too. Programmers tend to take attacks on their code personally, and arguments can turn into personal vendettas quickly.

But just like we're always able to find ways around a messy desk, we find ways around our technical problems too. And regardless of the decision, thinking about the problem gives us more insight into it. So ultimately, we're better off anyway after the exploration step. To use a cliche:

The journey is more important than the destination.

Comments (2):

k0s at Wed Feb 18 03:55:01 2009:

i fully agree that the issue is not cut and dry. its something i’ve learned from experience when to refactor and how much to refactor. the only real rule i’ve found is “don’t refactor at all unless you have a clear idea what you’re going for”, as it seems that violations of this rule lead to half-done refactors with significant worsening of the code base. that said, even that rule doesn’t apply if you just want to play around with code, which i consider a worthwhile task at times.

when i’m coding software by myself, i generally have a really good idea when to refactor. my rule is generally “refactor right before the refactoring is needed for a feature addition”. As an example, if you have one edge case, its probably not worth to refactor. But when you have a second one coming up that fits the same pattern, then it might be worth refactoring. Why? A couple reasons:

* because two edge cases almost always mean three edge cases

* refactoring early means less work overall. if you wait until you have ten edge cases, this means you’ve gone out of your way ten times to work within an ill-suited API to produce code that will be thrown away and refactoring will be harder to catch all of the bad code smell you introduced

Refactoring is never absolute. Never think “Oh, this code is refactored so its done.” For any evolving software project, today’s spec will be inadequate for tomorrow’s tasks.

Also, refactoring is tempered by pragmatism. If I’m doing software for myself, then I generally keep the code very clean, because the person I’m trying to satisfy is me. If there are deadlines, then I may put off some needed refactoring to get something to demo. And again, even if you have ten horrible edge cases, if they don’t fit into any recognizable pattern, then refactoring doesn’t do you any good. You can move the problem around, but you don’t gain anything (and as the blog post points out, moving code around is a net loss in terms of fragility and programmer effort).

Individually, I feel I have a good feel for these issues. The harder problem is what to do in the case of several people working on the same code. If I start my sweeping refactor — even if its the right time to do it and I have a clear objective — then I might screw other developers who were relying on the legacy API. Communication helps, but ultimately the decision to refactor community code should be more than an individual’s decision.

Robert Marianski at Wed Feb 18 03:55:50 2009:

You make some interesting points. About the half refactorings where you don’t have a good idea up front, I would argue that they are always useful. While they may not always result in better code, they always result in more understanding of what’s going in. This is the exploration that I think is the most valuable part of any refactoring process.

When you talk about refactoring when you have to add a new feature and it falls into a pattern, I’m not so sure that it’s even that easy. If you refactor early, and the problems you face in the future all fit this mold, then you’ve done the right thing. This is the best case scenario for an early refactoring. However, if you guess wrong, and a new problem is introduced that does not fit into the new pattern introduced, you’re stuck with a very sticky situation. Either you try to make it fit, ie hack this case to make it work, or you undo the previous refactoring, simplify things, and possibly draw upon this new knowledge to create a new abstraction. In this case, the early refactoring has worked against you.

The pragmatic issues you mention are all very important points that I didn’t touch upon in the post. There usually is a need to maintain backwards compatibility with legacy apis. And the more mature a project is, the bigger this issue is.

And as you mention, the refactorings are not always obvious. In this case, attempting to refactor can lead you to producing a net negative result, making the code more difficult to work with. But, I would argue that even exploring the possibility of refactoring leaves you with a better understanding of the problems, which ultimately lets you make more informed decisions.

Leave a comment: