Quantity vs Quality

September 14, 2008

When discussing whether quality is more important than quantity in programming circles, quality will often be cited as the clear winner. The argument is that focusing on quantity only ends up hurting us in the long run. Sacrificing quality usually means taking so called "shortcuts", which can lead to headaches in the future. When the shortcuts turn into dead ends, we end up having to take detours to get around them.

But does focusing completely on quality necessarily improve it? We try to justify spending more time improving quality by saying that once it's done the right way, the problem is less likely to resurface in other ways. In other words, we're spending more time on it now so that we don't have to in the future. But what does it actually mean to concentrate on the quality of the application? Just because I try to anticipate future uses of the code, or remove all duplication, or try to document what I'm doing doesn't necessarily mean the code is of higher quality. If there are known bugs and I eliminate those, I haven't necessarily raised the bar quality-wise. I could have introduced other defects as side effects, or some security or performance issues. I can spend more time writing tests, but again, I haven't improved the quality of the code here.

I'm not going to make an attempt to define software quality myself. I only wanted to point out that it's not completely obvious, and refactoring has its own risks. Instead, I'll summarize an interesting story I came across here.

In a ceramics class, half of the students were graded completely on quantity of work produced. The other half was graded completely on quality. You'd think that the quantity students would produce tons of sub par work, while the quality students would produce one amazing work. I did anyway.

I was wrong. The quantity group in fact ended up producing works of higher quality. They were able to learn from their own mistakes. The quality group, while coming up with sound theoretical works, failed to deliver.

This sounds very similar to programming. And unless you're this guy, it usually takes a few iterations to get something right.

We tend to learn more effectively through our own mistakes. Just like parents will let their children make their own mistakes, programmers learn to avoid pitfalls by first falling into them. The better programmers are the ones that write more code.

You come to nature with all her theories, and she knocks them all flat. -Renoir

Computer science is a misnomer. There is no series of steps to follow that will lead to a great application. There is no methodology that guarantees success. The science that we do have is what I would call "low level". We can prove that the running speed of quicksort is n log n for the average case. We can find the maximum of a set of numbers in n time. But I can't prove that my web application is not going to crash.

What makes the software engineering discipline so different from the other engineering disciplines is that absolutely all of the work happens in the design phase. When designing a building, coming up with the blueprints is just the first step. But with programming, the blueprints are everything. Naturally, I mean extremely detailed blueprints. That's really what a program is though, right? It's just a set of detailed instructions that the computer executes for us. When we're creating a building, the building is done when it's built. When creating an application, it's done when we've come up with the all of the instructions needed. It's one more step removed. That's why when we've created one copy of an application, it's trivial to create n copies.

Anyway, we slowly learn that there is no magic formula to write an effective application. There are guidelines, principles, best practices, but in the end, following all the rules doesn't guarantee a masterpiece.

This sounds a lot like the difficulties faced when trying to create a bestselling novel. There are guidelines, and patterns for plot developemnt that have worked well. But in the end, you can follow all the best practices that are out there, and still not come up with a masterpiece.

I know I'm making it sound like the both of these are very hit and miss. This is not true. Great authors consistently produce great works. Great programmers also consistently produce great applications. While it's hard to define what makes the works great, it's usually much easier to recognize. What's more is that most programmers can agree on who the great programmers are, yet when asked to quantify why, it's not easy to come up with an answer. We can all recognize it, but measuring it is difficult.

Universities however, teach programming from a much more scientific point of view. We're taught the fundamentals, big O notation, operating system concepts, basic software engineering and process, and things along those lines. Yet it's rare to find a university (at least I haven't come across one) that studies the great masterpieces of programming. Or one that takes the worst of programming, and criticizes it.

I think that this scientific approach to teaching programming is fundamentally wrong. We should be taking the hint from the liberal arts schools. They all study the great works of the past, which are endlessly discussed, analyzed, and criticized to great detail. Various writing techniques are dissected, and emulated. Students are encouraged to stray from the path, to explore.

Maybe the reason why universities take this approach is that most past software has been closed source. It's only fairly recently that open source has exploded in a very big way. But that's slowly changing. More and more open source software is getting written, and at a faster and faster pace.

But what I'm seeing happen is that we don't have to wait for the universities. We as programmers are coming up with our own ways to learn more effectively. After all, we control what software gets written, so a lot of it is geared towards making our own lives easier. This also includes learning from others, and coming up with better ways of sharing and connecting information. It's this trend, which seems to indicate that we're getting better faster, that leaves me hopeful for the future.

Game of insight

September 08, 2008

We've all had our frustrating moments with computers. We bang our heads against the walls for quite some time, and no matter what we try, the computer responds with a clever "I thought you might try that, here's your error." And then, we try talking to someone else about it, and they usually have a brilliant idea that solves everything elegantly. Then we're left scratching our heads, wondering why we didn't think of that before.

Lots of problems get solved like this simply because they are looked at from a fresh perspective. It's easy to get lost in the details of a problem. When we keep our heads down, it's hard to realize that we were just approaching the problem from the wrong angle.

Programming is a game of insight.

I think that sums up the essence of programming. The most significant gains are often those that shed the problem in a new light.

But not all reevaluations of a problem lead to successes. In fact, I would argue that most of them don't. But the ones that do work, usually do so in a big way. Given that programming is this give and take process, progress often isn't linear. There isn't a lot of progress, or it looks like things are getting worse, and then suddenly, there's a big jump.

This makes it especially difficult to measure, assuming it's even possible to measure at all. Any formal attempts at measurements results in programmers optimizing for the local maxima. This ends up detracting from productivity.

But regardless of whether we can keep track of it or not, it's important to foster an environment that encourages creativity. A single idea can change everything.

Laws of the land

September 07, 2008

We all go through programming disasters. It's hard, if not impossible, to always make the right decisions. And after we've steered the ship back on course and we're out of the storm, we try to review if there was any way we could have prevented the storm in the first place. This reflection is crucial, and is one of the best ways we can improve ourselves.

What becomes dangerous however, is when these reviews turn into policies, or mandates. To explain why, I'll summarize an anecdote which I've come across from the Extreme Programming book.

A mother was baking a ham with her daughter, and she noticed that the ends were cut off. She asked her mother why, to which the mom responded: "I don't know. That's the way my mother always did it. I'll ask her." So the mother asked the grandmother why the ends of the ham were cut off, and she said: "I don't know. That's the way my mother always did it. I'll ask my mother". And the great grandmother's response was: "My oven was too small, so I had to cut off the ends to have it fit".

Blindly following policies can lead to extra steps that can work against us. Rather than come up with new policies, it's better to come up with principles that can inform future decisions. It's more important to understand the reasons behind what those policies would have been.

On that note, dogma itself is dangerous. When we start blindly following rules, we start becoming simple machines. It tends to stifle creativity, which is the worst thing that can happen to programmers. Programming itself, after all, is a creative process.

I've seen a trend by programmers to be completely against any form of duplication. After all, there are extremely compelling motivations for this idea. Programmers have all copy/pasted code to get something working, with the reason being that it usually ends up working much faster in the short term. But then when all that new code needs to get updated, we've all forgotten to make the change to all the pieces that required it. And bingo, we have a new bug.

Then we look back on it and what's to blame? Not that we forgot to make the change everywhere, which was just the symptom. The problem was that it was possible to change one of the parts, and forget to update the other. It should have been refactored to avoid the duplication, so that a change in one location would naturally affect all the other paths. This would have prevented the bug from even being possible.

But like most dogmas taken too far, this can get you into trouble. Here's a very contrived example, and granted most of us don't think this way, but I've experienced somewhat similar arguments for avoiding duplication where I thought it was just silly.

a = 7
b = 3 + 12
c = 18 - 2
d = 9 * 6

Try to pretend that these are real calculations, and there are not just hardcoded numbers here. Can you spot the duplication? Normally you'd say there isn't, but I can argue that there is. You have four assignments, and 3 mathematical operations. Isn't that duplication? Here's the "reduced" version:

vars = ['a', 'b', 'c', 'd']
arguments = [(), (3, 12), (18, 2), (9, 6)]
ops = [lambda *x:7, operator.add, operator.sub, operator.mul]
for var, args, op in zip(vars, arguments, ops):
    globals()[var] = op(*args)

Notice how I've eliminated all the duplication? And wasn't it clever of me to fit the simple assignment in the first example to a no-op? The logic is all now in a single line compared to the 4 up above. I can argue that although it's actually more lines of code, I can effectively move all but the loop into configuration. Now people can add more variables to the global namespace, with an arbitrary operation performed on any number of arguments, all through configuration! What a wonderful and extensive system I've created!

The perceptive reader will have discovered that I was being a tad bit sarcastic. (My co-workers will all tell you that I'm subtle). So which is easier to understand? Which would you rather maintain? Readers with no python experience will probably understand the first code snippet. But you probably need to know python to even attempt to understand what's going on in the second.

So if you're going to follow dogmas, policies, or rules, then follow this one:

Always use your brain.

Cleaning up

September 04, 2008

We all have different ideas on when we should clean things up. It could be a dirty room, cluttered desk, messy closet, or just too much stuff lying around. We also prefer to keep things a certain way, which can be very individualistic. From the outside a desk with tons of paper lying around could look like it could use some rearranging, but the desk's owner might be able to find anything at a moment's notice.

Given how different we all are with tolerating visual clutter, it's not too surprising that there's a wide range of opinions on when we should clean up our code. Many books have been written completely focused on this very issue, and in the end, I think it's still much more of an art than a science. In fact, I believe programming itself is much more of an art than a science, but that's the topic for a whole other discussion.

Then there's a time for spring cleaning too. And we're always surprised when we find some really dirty stuff in the nooks and crannys. But in the real world, we usually do end up cleaning it up. (That is, if you're not as lazy as I am). In the virtual world however, some cleanup tasks are truly daunting, and it may be easier to leave the dirty things the way they are. After all, it might not look pretty, but it works.

And similar to how the owner of a cluttered desk can find something for us quickly, old dirty code tends to work predictably as well. But if we change it a bit, we can no longer guarantee that it'll work the same way. Unit tests can help here, but we still can't make any guarantees. And when you move an old couch from one corner of the room to the other, you realize that there was a whole lot more dust under there than you thought. That dust also has to get cleaned up. And after some time cleaning up, we ask ourselves, was it really worth it? That couch really wasn't all that bad where it was.

But, the advantages gained by cleaning up can outweigh its costs and risks. After all, if something becomes simpler and easier to understand, we stand to gain every single time someone works with it. Add up the time for all these future interactions, and the refactoring can more than pay for itself.

However, like weatherman and traders already know, it's hard to predict the future. (They'll never admit it though. What I think these guys really excel at is coming up with excuses. :) ) But, it's very easy to predict the past. And although we may pay a small price each time we end up working around the dirt, we start learning where all the dirty areas are. This puts us in a better position to clean up more effectively in the future.

Refactoring usually alters the abstractions used. A new layer could get added that simplifies complex interactions by handling those details. Or extra layers that get in the way are removed to produce simpler code. But if a new problem comes along that doesn't fit into these new abstractions, then the game is up. Chances are we'll put in a hack to work around that "edge case", and those can start to pile up, especially if other programmers start putting their hands in the mix. And before you know it, we'll be talking about a new refactoring.

I'm not arguing that we should never clean up code though. I'm just pointing out that there are often more factors involved when thinking about cleaning up than just making the code prettier. What cleaning up does seem to do is improve morale. Most programmers would much rather write a 1000 new lines of clean, sparkling code, rather than try to figure out the ten lines in a 10000 line messy codebase that need to be modified. And when bold, noble undertakings like these are launched, I start to feel like Braveheart just gave me a speech before an impossible battle.

And what usually ends up happening is that the small splinter cell team beats the odds. But why? There's too much work and not enough time. Well, I think that the answer is simple: the programmers work harder. And the reason for that is that they are a whole lot more motivated with the prospect of a fresh start instead of trying to keep a big ball of mud together.

In the end, major refactorings are a tough call either way. And the technical issues might not be the whole story. Social issues can also play a role in the decision too. Programmers tend to take attacks on their code personally, and arguments can turn into personal vendettas quickly.

But just like we're always able to find ways around a messy desk, we find ways around our technical problems too. And regardless of the decision, thinking about the problem gives us more insight into it. So ultimately, we're better off anyway after the exploration step. To use a cliche:

The journey is more important than the destination.