Software Quality

February 3, 2014
First intention, then enlightenment. - Buddhist maxim

With software, quality is paramount. When quality begins to suffer, everything else follows suit. We see higher defect rates, longer delays to implement new features, and generally speaking, a more fragile code base. It also leads to a broken window effect, where modifications and updates will tend to be poor in quality as well.

But what does it mean to say that a piece of software has high quality? I think it means the code should be easy to maintain. And what does it mean to be maintainable? Maintainable code has the property that its intent is clear. Meaning, it's easy to understand what the code is doing, what its goal is, and why it's there. Quality software has additional characteristics as well, but I believe that most of these fall out of its maintainability. One litmus test is to read a piece of code and think about how you would add some additional features to it. If it's clear, then that means the code itself is clear.

A different way of phrasing this is to say that high quality code doesn't violate the principle of least surprise. Code should behave like it looks. When it's surprising, that's a sign that it's not as clear as it should be. A corollary is that two pieces of code that look the same should behave similarly. If they look different, they should behave differently.

So, how do we go about writing code that's high in quality? In my opinion, the best way to improve software quality is simply to care about it. Great code is usually difficult to get right, and it often takes a few failed attempts before a cleaner design can emerge. But these epiphanies will only appear after deliberate thought. And these thoughts will only come to us if we care about improvement in the first place.

Certainly, great code emerges from great effort. There are differing opinions on how to get there, but asking the right questions is the crucial part. It's the intention to improve that leads to enlightenment.

Usually, a problem can be solved in multiple ways, with each being an effective solution. Regarding programming languages, this is also true. Many problems can be solved with multiple languages, in fact theoretically all turing complete languages should be able to solve the same problems.

Nevertheless, we tend to stick to the language that we are most comfortable with, whether that be the first language that we learnt, or just the language that we're most comfortable working with currently. Learning new languages requires investment, not just in learning the language, but of the libraries, idioms, and environments as well.

When solutions are chosen to problems, they can usually be boiled down to some combination of what I'm calling the language itself, the libraries, or the environment.

Language

Language here is meant to be the expression of the solution to problems. It is the syntax, but more importantly the vocabulary of concepts that can be used as mechanisms to construct solutions. In the general sense, this is the most crucial, because like with human language, it constrains the concepts that are available for solutions.

Libraries

These are the existing bodies of code tha are available to be readily consumed. Most languages have a suitable standard library that are able to perform the necessities. Naturally, necessities vary a little bit based on audience and the problem, but there are certain agreed upon operations that are so ubiquitous that they are taken for granten. But, the breadth and depth of the libraries around a language can vary tremendously.

Environment

By environment, I refer to the tooling surrounding the language and libraries. This could be any ide's that are used, but more importantly, it's the ecosystem of operations available to support daily operations for programmers. These include things like debugging, build/package management, deployment, run time characteristics, and many others.

 

The decision as to whether a language can be used involves weighing the combination of these three elements. We could have the perfect language, but it won't be picked up if there aren't libraries to support day to day operations. And if we have a great language and even better library support, it may be dismissed if it can't run in our production environment. Note that the opposite can also be true. Sometimes a library is selected because of what it can do and the language and environment simply come along for the ride.

There are also social elements to consider as well. If our team is particularly well versed in a certain area, chances are good that we won't veer too far from our domain of expertise. Additionally, if some knowledge is highly esoteric, that can play a major role in the decision as well.

What's interesting to me is that it rarely comes out that the decision is merely the sum of its parts. There's a certain minimum score required in each area. The minimum varies based on context, but nevertheless, a champion in one but a flop in the others will most likely not be chosen.

Although in the abstract there are many factors to consider, practically a select few typically emerge as the important players. It's important to identify which of these are significant earlier on, and focus on those when we make our choices.

Thoughts on Language

December 25, 2012

We use language as a means to communicate. It gives shape to how we express ourselves. But, it also feeds back and informs what we express, and how we express it. If a language lacks words to express a concept, we probably won't consider that concept when we think in that language.

Programming languages are no different. They are the tools we use to communicate with the computer, and also with each other. And like human languages, some have words to express certain concepts, and others don't. Also like human language, the programming language used helps define the concepts that are used in the program itself.

Another parallel is that all languages have subtleties and nuances behind their expression. These lend themselves nicely to poetry, or comedy. But with programming languages, these multiple interpretations can have disastrous consequences. Computers need very precise instructions, and when there is ambiguity, they tend to prefer failure over resilience.

Clearly we need to use a higher level language than that of machine instructions. Although these lower level languages are very precise, they lack methods of expression that are natural for humans, and are tedious for us to use directly. But, using human language is too complex and vague for computers to make sense of. Our challenge then is to strike a balance where we can more naturally express our intent, while simultaneously offering a precise definition for the computer to execute.

Taking a quick survey of programming languages shows that we've created our own tower of babel. In retrospect, it's not surprising that we ended up here though. Different problem domains call for different types of solutions. For some, performance is crucial, and therefore it's imperative to have access to the lower level mechanics of the machine. For others, the user experience is vital and performance is a non-issue. Programming languages run the gamut from low level memory manipulation to higher level meta languages. And as our hardware has evolved, our languages have tried to adapt by making the appropriate trade-offs as well.

Programming languages tend to be general by definition, and offer general purpose constructs to solve a wide variety of problems. Coming up with effective general solutions is critical, and it is the norm to require these. But, classes of problems do exist that demand certain particulars, and a language's ability to frame some of these solutions directly affects the shape that the solution will take. These can be domain specific languages, but they can also be picking a different programming style. For example, some problems are elegantly solved using logic programming. Others benefit from using a rules engine. Sometimes modeling using states and then following a transition graph is a good fit. Some of these can be used as a library to a language, but I would argue that languages offer different levels of support for these different styles, and it's more natural to reach for these kinds of solutions, since those concepts are a better fit for certain classes of languages.

When evaluating languages, brevity is often measured to determine expressiveness. Being able to model a solution using fewer parts generally means that it is easier to reason about, which ultimately means the system is easier to maintain. We have to be careful here though, because the important piece is that the system is easier to reason about, not that it's shorter. Just because something is terse, doesn't mean it's easy to understand. It could be conceptually dense with implicit meaning, which is the sort of thing that can lead to errors. If anything, we should strive to use constructs that prevent errors and encourage good practices. That being said, solutions modeled with fewer constructs tend to be more elegant. They often achieve this by using concepts that are better fits for the problem.

One mistake that I often make though is as I learn a new programming language, I try to think of it as the one true language that everything should be written in. Armed with my new hammer, I try to find problems that I can apply this language to, and end up trying to use it in many situations where it wasn't the best choice. I think that programmers are guilty of this kind of thinking in general though. Part of it is just that learning a new language is a significant investment, and we attempt to justify our investment by using it more. It's important to recognize when we have more faith in our language of choice than we should.

Particularly exciting for me is that we are now in a time where languages, tools, and libraries can evolve extremely rapidly, and can be shared seamlessly. Our tools to facilitate experiments with languages and approaches are only improving, and many different paradigms are surfacing as a result. I look forward to how our existing languages will continue to evolve, and to the languages of the future.

The Success of Failures

December 31, 2011
Once a group of ten blind masseuses were travelling together in the mountains, and when they began to pass along the top of the precipice, they all became very cautious, their legs shook, and they were in general struck with terror. Just then the leading man stumbled and fell off the cliff. Those that were left all wailed, "Ahh, ahh! How piteous!" But the masseuse who had fallen yelled up from below, "Don't be afraid. Although I fell, it was nothing. I am now rather at ease. Before falling I kept thinking 'What will I do when I fall?' and there was no end to my anxiety. But now I've settled down. If the rest of you want to be at ease, fall quickly!"

To err is human. But we often think of mistakes as necessary evils, actions or situations that could have been avoided if we had the foresight. After all, some mistakes can be painful. Yet, I would argue that we learn best from our own mistakes. Even if it's not a conscious thought in our mind, when faced with a similar situation or problem, our intuition can help us navigate around repeating the same mistake twice. If I touch a hot stove once, I'm probably not going to touch it again.

For easier problems, the cause and effect between the mistake and the outcome is readily apparent. But as things get more complicated, it's not always easy to see what the actual mistake was that generated the failure. From a software development perspective, the actual underlying cause can elude us, and everybody can be left drawing their own conclusions as to what happened. We need to be careful here though, because the wrong lesson can guide us down the wrong path in the future. Once we are burnt by a hot stove, we'll never touch a hot stove again. But if we learn the wrong lesson, we may never touch a cold one either.

One interesting idea is that the problem space itself can dictate the strategy used to solve it. When all variables are known, we simply use the answer for our given permutation. However, some problems don't have an easy "if this, do that" answer. For these problems, we can set up fail-safe experiments, where each one is an attempt at a solution from a different angle, but their failures aren't catastrophic. Recovering from the failures is the key here. In fact, many failures initially can lead to a better outcome in the end, because they can each inform the ultimate solution based on what we learned from their failures.

From a business perspective, this can be a hard sell though. How can you justify allocating resources on what you know will mostly end up being a failure? Isn't that just a waste? What we need to admit first is that we may not know enough about the particular problem to be in a position where we can recommend a single solution that has a good chance of success. And the best way to learn more may be to attempt to solve it in multiple ways, many of which will fail. Naturally, nobody wants to hear this kind of news. The immediate reaction could be, "well, let me try to find somebody that knows more about this." But for problems that are relatively new, experts can be hard to come by.

Some will also try to rely on a process to get through the problem. And for known problems, it is a fine approach to rely on best practice. By definition though, best practice is past practice, so we can't expect to have best practice for all situations, especially for new problems that we don't fully understand.

Accepting that failures occur is an important step. Instead of focusing on preventing them completely, we can instead create environments that are more tolerant of our failures. And we shouldn't simply tolerate mistakes, but accept them as an integral part of the process, and how we continue to improve ourselves.

At the time when there was a council concerning the promotion of a certain man, the council members were at the point of deciding that promotion was useless because of the fact that the man had previously been involved in a drunken brawl. But someone said, “If we were to cast aside every man who had made a mistake once, useful men could probably not be come by. A man who makes a mistake once will be considerably more prudent and useful because of his repentance. I feel that he should be promoted.” Someone else then asked, “Will you guarantee him?” The man replied, “Of course I will.” The others asked, “By what will you guarantee him?” And he replied, “I can guarentee him by the fact that he is a man who has erred once. A man who has never once erred is dangerous.” This said, the man was promoted.

Being functional

January 10, 2011

Many have written about why functional programming matters. There's a great pdf on why functional programming matters by John Hughes that I highly recommend reading. It talks about the many benefits of code written in a functional style, that it's incredibly more composable, testable, and in short, higher in quality.

But functional programming is also important for another reason. It's more awesome.

Factorial

fac n = foldl (*) 1 [1..n]

Haskell

Computing factorials is the classic recursive example.

Fibonacci

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Haskell

This corecursive example generates a lazy list of the fibonacci numbers in linear time. The cool part here is the self-referential data structure. I like to call this the reverse ouroboros definition: the tail eating the head.

Quicksort

qsort [] = [] qsort (p:xs) = qsort lesser ++ [p] ++ qsort greater where lesser = filter (< p) xs greater = filter (>= p) xs

Haskell (copied from literate programs wiki)

This is beautiful.

Find all numbers which have only 2, 3, & 5 as non-trivial factors.

from itertools import tee, chain, islice, groupby from heapq import merge def hamming_numbers(): # Generate "5-smooth" numbers, also called "Hamming numbers" # or "Regular numbers". # See: http://en.wikipedia.org/wiki/Regular_number # Finds solutions to 2**i * 3**j * 5**k for some integers i, j, k. def deferred_output(): 'Works like a forward reference to the "output" variable' for i in output: yield i result, p2, p3, p5 = tee(deferred_output(), 4) # split streams m2 = (2*x for x in p2) # multiples of 2 m3 = (3*x for x in p3) # multiples of 3 m5 = (5*x for x in p5) # multiples of 5 merged = merge(m2, m3, m5) combined = chain([1], merged) # prepend start output = (k for k, v in groupby(combined)) # eliminate dupes return result

Python (copied from ActiveState)

This cyclical iteration technique in python is the same kind of idea as the haskell fibonacci example above. The output streams are fed back in to generate the final result.

List flattening

(defun flatten (x) (labels ((rec (x acc) (cond ((null x) acc) ((atom x) (cons x acc)) (t (rec (car x) (rec (cdr x) acc)))))) (rec x nil)))

Common lisp

An example of a doubly recursive utility.

Function intersection

(defun fint (fn &rest fns) (if (null fns) fn (let ((chain (apply #'fint fns))) #'(lambda (x) (and (funcall fn x) (funcall chain x))))))

Common lisp (copied from onlisp)

Example of a function builder, with a recursive definition. The result here is and'ing the functions together. This allows us to say things like:

(find-if (fint #'signed #'sealed #'delivered) docs)

Recursive function generators

(defun lrec (rec &optional base) (labels ((self (lst) (if (null lst) (if (functionp base) (funcall base) base) (funcall rec (car lst) #'(lambda () (self (cdr lst))))))) #'self)) ; copy-list (lrec #'(lambda (x f) (cons x (funcall f)))) ; remove-duplicates (lrec #'(lambda (x f) (adjoin x (funcall f)))) ; find-if, for some function fn (lrec #'(lambda (x f) (if (fn x) x (funcall f)))) ; some, for some function fn (lrec #'(lambda (x f) (or (fn x) (funcall f))))

Common lisp (copied from onlisp)

If we find ourselves constantly building recursive functions ourselves to traverse through a list, why not abstract out that pattern? Here we see a technique for doing just that. We just need to define a function whose first argument represents the first element of the list, and the second is a function to call to continue the recursion. We can also define traversers on subtrees, which the reference link explores.
Note that we can make this even more concise, but we need lisp macros for that. But that's a topic for another discussion :)

Find the maximum profit buying and selling a stock one day

(defn max-profit [prices] (reduce max (map - prices (reductions min prices))))

Clojure

I just had to mention this one because of its brevity. It's almost like the problem was created to fit the solution.

20 questions

(defvar *nodes* (make-hash-table)) (defun defnode (name conts &optional yes no) (setf (gethash name *nodes*) (if yes #'(lambda () (format t "~A~%>> " conts) (case (read) (yes (funcall (gethash yes *nodes*))) (t (funcall (gethash no *nodes*))))) #'(lambda () conts))))

Common lisp (copied from onlisp)

20 questions in a dozen lines. Not too shabby.
Here we see a technique for representing structure through closures themselves. Traditionally this is modelled using a tree to represent the yes/no decisions. In this case we're able to use the functions themselves to represent the decision trees, with the state captured through closures.
If the network will not change at run time, an optimization can be made to find the references to the yes/no functions at compile time instead of looking them up through a hash table. The reference link above explores this strategy.