The Cost of Abstraction

October 26, 2008

Most programmers follow a progression as their skills improve.

Initially, we have a just make it work mentality. In this phase, there is no structure or attempt at creating any abstractions. If functionality needs to be duplicated in a slightly different fashion, the lines of code get duplicated. We happily trudge along in this phase until we are met with a project of significant complexity. Here we start to break down, and quickly learn that it becomes difficult to maintain.

In the next phase, which I like to call "abstraction envy", we start to learn the different ways we can architect our applications. We begin to learn the different designs, structures, and patterns that we can use to ease the burden of maintaining applications. As we learn more and more patterns however, we try to apply our newly found knowledge as widely as possible. Given that this is still a learning process, and much of programming is learned by doing, we often pick the wrong tool for the job. To use the popular metaphor, we use a hammer, but we're not hitting a nail. We slowly start to learn that their are some downsides to these patterns as well, and just because we understand them doesn't mean we should use them. If a little is good, that doesn't mean that a lot is better.

This next phase I'm about to describe is the middle ground between the first two. This ideal phase is essentially a zen state of programming. Abstractions are only used when they are necessary. And the abstractions introduced are not just for code reuse, but they create simple, yet powerful ways of thinking about our problems. These types of abstractions are more useful because they more closely match the conceptual, or real problems. You'll know when you see one of these because it'll seem like the problem was created for the abstraction instead of the other way around. Besides creating perfect abstractions for the parts of the application that need them, the flip side is just as important, if not more so. Abstractions are not created for the parts of the application that don't need them. Simple things are left simple, and complicated things are possible.

I think most programmers are in the middle state. If you ask a programmer his advice on a particular problem, you'll probably get an answer, and you'll probably get a lot more information about other problems that get created along the way. Sometimes these problems themselves require complicated solutions, and then these problems cascade.

That brings us to the heart of the problem really. Simple things should be easy to change. If they're not, something is wrong. But none of us writes code with the idea that we're making certain things harder. We always believe that we're improving things. We see some boilerplate, and we think to ourselves, gee, wouldn't it be great if we didn't have to do that all the time? Let's find a way to eliminate it. We move along happily, proud of how much code we've reduced. The problem lies in the future, when we're thrown a curve ball. It doesn't fit in our strike zone, but we've got to hit a home run anyway. We look at the code, and we realize that it would have been a whole lot easier to do if we didn't have to deal with those abstractions we added in earlier. We'll have to modify them significantly to make the new problem fit. In other words, the abstraction leaks, because we now have to understand the implementation.

And now we're faced with a tough decision. Do we make these modifications - modifications is a good word for this, it's usually a hack - or do we scrap them and go for a simpler approach. The hacks are usually easier and we feel more confident that they'll work, but they increase the entropy of the system which will make it harder for us to maintain. A bigger cleanup will take more time and is riskier, but is probably a good long term investment in some cases. Unit tests will help in this case, but like good abstractions, I find that it's just as hard to write or find tests at the right abstraction level to pull this off.

So how can we avoid getting ourselves into a mess like this in the future? I think the real answer is that we have to get ourselves tangled into a few webs of anarchy before we can learn how to avoid them. But in an attempt to answer the question, instead of erring on the side of introducing abstractions, we can err on the side of leaving them out. Simpler code is usually easier to modify, although that's not true for large pieces of spaghetti where a few well placed abstractions trim out a lot of the complexity. But my personal opinion is that it's usually easier - and more fun - to add in additional layers of abstractions rather than break free from existing ones. It also feels safer.

But perhaps what should guide us the most in these ambiguous cases is the principle of least astonishment. If the presence of a particular abstraction is surprising, then it probably shouldn't be in there. Another good question to ask ourselves is if the new abstraction actually makes things simpler. If the code is easier to understand with the abstraction, awesome. If it's a puzzle to figure out what is going on, and it's just there to save on lines of code, change it.

It's easy to make things. It's hard to make things simple