Thoughts on Language

December 25, 2012

We use language as a means to communicate. It gives shape to how we express ourselves. But, it also feeds back and informs what we express, and how we express it. If a language lacks words to express a concept, we probably won't consider that concept when we think in that language.

Programming languages are no different. They are the tools we use to communicate with the computer, and also with each other. And like human languages, some have words to express certain concepts, and others don't. Also like human language, the programming language used helps define the concepts that are used in the program itself.

Another parallel is that all languages have subtleties and nuances behind their expression. These lend themselves nicely to poetry, or comedy. But with programming languages, these multiple interpretations can have disastrous consequences. Computers need very precise instructions, and when there is ambiguity, they tend to prefer failure over resilience.

Clearly we need to use a higher level language than that of machine instructions. Although these lower level languages are very precise, they lack methods of expression that are natural for humans, and are tedious for us to use directly. But, using human language is too complex and vague for computers to make sense of. Our challenge then is to strike a balance where we can more naturally express our intent, while simultaneously offering a precise definition for the computer to execute.

Taking a quick survey of programming languages shows that we've created our own tower of babel. In retrospect, it's not surprising that we ended up here though. Different problem domains call for different types of solutions. For some, performance is crucial, and therefore it's imperative to have access to the lower level mechanics of the machine. For others, the user experience is vital and performance is a non-issue. Programming languages run the gamut from low level memory manipulation to higher level meta languages. And as our hardware has evolved, our languages have tried to adapt by making the appropriate trade-offs as well.

Programming languages tend to be general by definition, and offer general purpose constructs to solve a wide variety of problems. Coming up with effective general solutions is critical, and it is the norm to require these. But, classes of problems do exist that demand certain particulars, and a language's ability to frame some of these solutions directly affects the shape that the solution will take. These can be domain specific languages, but they can also be picking a different programming style. For example, some problems are elegantly solved using logic programming. Others benefit from using a rules engine. Sometimes modeling using states and then following a transition graph is a good fit. Some of these can be used as a library to a language, but I would argue that languages offer different levels of support for these different styles, and it's more natural to reach for these kinds of solutions, since those concepts are a better fit for certain classes of languages.

When evaluating languages, brevity is often measured to determine expressiveness. Being able to model a solution using fewer parts generally means that it is easier to reason about, which ultimately means the system is easier to maintain. We have to be careful here though, because the important piece is that the system is easier to reason about, not that it's shorter. Just because something is terse, doesn't mean it's easy to understand. It could be conceptually dense with implicit meaning, which is the sort of thing that can lead to errors. If anything, we should strive to use constructs that prevent errors and encourage good practices. That being said, solutions modeled with fewer constructs tend to be more elegant. They often achieve this by using concepts that are better fits for the problem.

One mistake that I often make though is as I learn a new programming language, I try to think of it as the one true language that everything should be written in. Armed with my new hammer, I try to find problems that I can apply this language to, and end up trying to use it in many situations where it wasn't the best choice. I think that programmers are guilty of this kind of thinking in general though. Part of it is just that learning a new language is a significant investment, and we attempt to justify our investment by using it more. It's important to recognize when we have more faith in our language of choice than we should.

Particularly exciting for me is that we are now in a time where languages, tools, and libraries can evolve extremely rapidly, and can be shared seamlessly. Our tools to facilitate experiments with languages and approaches are only improving, and many different paradigms are surfacing as a result. I look forward to how our existing languages will continue to evolve, and to the languages of the future.

The Success of Failures

December 31, 2011
Once a group of ten blind masseuses were travelling together in the mountains, and when they began to pass along the top of the precipice, they all became very cautious, their legs shook, and they were in general struck with terror. Just then the leading man stumbled and fell off the cliff. Those that were left all wailed, "Ahh, ahh! How piteous!" But the masseuse who had fallen yelled up from below, "Don't be afraid. Although I fell, it was nothing. I am now rather at ease. Before falling I kept thinking 'What will I do when I fall?' and there was no end to my anxiety. But now I've settled down. If the rest of you want to be at ease, fall quickly!"

To err is human. But we often think of mistakes as necessary evils, actions or situations that could have been avoided if we had the foresight. After all, some mistakes can be painful. Yet, I would argue that we learn best from our own mistakes. Even if it's not a conscious thought in our mind, when faced with a similar situation or problem, our intuition can help us navigate around repeating the same mistake twice. If I touch a hot stove once, I'm probably not going to touch it again.

For easier problems, the cause and effect between the mistake and the outcome is readily apparent. But as things get more complicated, it's not always easy to see what the actual mistake was that generated the failure. From a software development perspective, the actual underlying cause can elude us, and everybody can be left drawing their own conclusions as to what happened. We need to be careful here though, because the wrong lesson can guide us down the wrong path in the future. Once we are burnt by a hot stove, we'll never touch a hot stove again. But if we learn the wrong lesson, we may never touch a cold one either.

One interesting idea is that the problem space itself can dictate the strategy used to solve it. When all variables are known, we simply use the answer for our given permutation. However, some problems don't have an easy "if this, do that" answer. For these problems, we can set up fail-safe experiments, where each one is an attempt at a solution from a different angle, but their failures aren't catastrophic. Recovering from the failures is the key here. In fact, many failures initially can lead to a better outcome in the end, because they can each inform the ultimate solution based on what we learned from their failures.

From a business perspective, this can be a hard sell though. How can you justify allocating resources on what you know will mostly end up being a failure? Isn't that just a waste? What we need to admit first is that we may not know enough about the particular problem to be in a position where we can recommend a single solution that has a good chance of success. And the best way to learn more may be to attempt to solve it in multiple ways, many of which will fail. Naturally, nobody wants to hear this kind of news. The immediate reaction could be, "well, let me try to find somebody that knows more about this." But for problems that are relatively new, experts can be hard to come by.

Some will also try to rely on a process to get through the problem. And for known problems, it is a fine approach to rely on best practice. By definition though, best practice is past practice, so we can't expect to have best practice for all situations, especially for new problems that we don't fully understand.

Accepting that failures occur is an important step. Instead of focusing on preventing them completely, we can instead create environments that are more tolerant of our failures. And we shouldn't simply tolerate mistakes, but accept them as an integral part of the process, and how we continue to improve ourselves.

At the time when there was a council concerning the promotion of a certain man, the council members were at the point of deciding that promotion was useless because of the fact that the man had previously been involved in a drunken brawl. But someone said, “If we were to cast aside every man who had made a mistake once, useful men could probably not be come by. A man who makes a mistake once will be considerably more prudent and useful because of his repentance. I feel that he should be promoted.” Someone else then asked, “Will you guarantee him?” The man replied, “Of course I will.” The others asked, “By what will you guarantee him?” And he replied, “I can guarentee him by the fact that he is a man who has erred once. A man who has never once erred is dangerous.” This said, the man was promoted.

Being functional

January 10, 2011

Many have written about why functional programming matters. There's a great pdf on why functional programming matters by John Hughes that I highly recommend reading. It talks about the many benefits of code written in a functional style, that it's incredibly more composable, testable, and in short, higher in quality.

But functional programming is also important for another reason. It's more awesome.

Factorial

fac n = foldl (*) 1 [1..n]

Haskell

Computing factorials is the classic recursive example.

Fibonacci

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Haskell

This corecursive example generates a lazy list of the fibonacci numbers in linear time. The cool part here is the self-referential data structure. I like to call this the reverse ouroboros definition: the tail eating the head.

Quicksort

qsort [] = [] qsort (p:xs) = qsort lesser ++ [p] ++ qsort greater where lesser = filter (< p) xs greater = filter (>= p) xs

Haskell (copied from literate programs wiki)

This is beautiful.

Find all numbers which have only 2, 3, & 5 as non-trivial factors.

from itertools import tee, chain, islice, groupby from heapq import merge def hamming_numbers(): # Generate "5-smooth" numbers, also called "Hamming numbers" # or "Regular numbers". # See: http://en.wikipedia.org/wiki/Regular_number # Finds solutions to 2**i * 3**j * 5**k for some integers i, j, k. def deferred_output(): 'Works like a forward reference to the "output" variable' for i in output: yield i result, p2, p3, p5 = tee(deferred_output(), 4) # split streams m2 = (2*x for x in p2) # multiples of 2 m3 = (3*x for x in p3) # multiples of 3 m5 = (5*x for x in p5) # multiples of 5 merged = merge(m2, m3, m5) combined = chain([1], merged) # prepend start output = (k for k, v in groupby(combined)) # eliminate dupes return result

Python (copied from ActiveState)

This cyclical iteration technique in python is the same kind of idea as the haskell fibonacci example above. The output streams are fed back in to generate the final result.

List flattening

(defun flatten (x) (labels ((rec (x acc) (cond ((null x) acc) ((atom x) (cons x acc)) (t (rec (car x) (rec (cdr x) acc)))))) (rec x nil)))

Common lisp

An example of a doubly recursive utility.

Function intersection

(defun fint (fn &rest fns) (if (null fns) fn (let ((chain (apply #'fint fns))) #'(lambda (x) (and (funcall fn x) (funcall chain x))))))

Common lisp (copied from onlisp)

Example of a function builder, with a recursive definition. The result here is and'ing the functions together. This allows us to say things like:

(find-if (fint #'signed #'sealed #'delivered) docs)

Recursive function generators

(defun lrec (rec &optional base) (labels ((self (lst) (if (null lst) (if (functionp base) (funcall base) base) (funcall rec (car lst) #'(lambda () (self (cdr lst))))))) #'self)) ; copy-list (lrec #'(lambda (x f) (cons x (funcall f)))) ; remove-duplicates (lrec #'(lambda (x f) (adjoin x (funcall f)))) ; find-if, for some function fn (lrec #'(lambda (x f) (if (fn x) x (funcall f)))) ; some, for some function fn (lrec #'(lambda (x f) (or (fn x) (funcall f))))

Common lisp (copied from onlisp)

If we find ourselves constantly building recursive functions ourselves to traverse through a list, why not abstract out that pattern? Here we see a technique for doing just that. We just need to define a function whose first argument represents the first element of the list, and the second is a function to call to continue the recursion. We can also define traversers on subtrees, which the reference link explores.
Note that we can make this even more concise, but we need lisp macros for that. But that's a topic for another discussion :)

Find the maximum profit buying and selling a stock one day

(defn max-profit [prices] (reduce max (map - prices (reductions min prices))))

Clojure

I just had to mention this one because of its brevity. It's almost like the problem was created to fit the solution.

20 questions

(defvar *nodes* (make-hash-table)) (defun defnode (name conts &optional yes no) (setf (gethash name *nodes*) (if yes #'(lambda () (format t "~A~%>> " conts) (case (read) (yes (funcall (gethash yes *nodes*))) (t (funcall (gethash no *nodes*))))) #'(lambda () conts))))

Common lisp (copied from onlisp)

20 questions in a dozen lines. Not too shabby.
Here we see a technique for representing structure through closures themselves. Traditionally this is modelled using a tree to represent the yes/no decisions. In this case we're able to use the functions themselves to represent the decision trees, with the state captured through closures.
If the network will not change at run time, an optimization can be made to find the references to the yes/no functions at compile time instead of looking them up through a hash table. The reference link above explores this strategy.

Best practices are constantly evolving. Sometimes so much so that we're hearing the opposite today of what we heard yesterday.

But there are some fundamentals that tend to stay the same though. One that most of us can agree one is that bugs can be expensive. More specifically, the later we find bugs, the more costly they are to fix. On a whiteboard, the cost of fixing a bug is just about nothing. It can be as simple as replacing one diagram with another. But in production, the costs could be catastrophic, including loss of current or potential customers, direct loss in financial applications, or human life could even be at stake.

On one side of the spectrum, we have the school of thought that tries to solve as many problems initially as possible. After all, if it's cheaper to fix bugs earlier in the process, let's just spend more time earlier in the process. In practice, this doesn't work so well for a number of reasons. The biggest one is probably just the fact that requirements change during the process itself. There's not really anything we can do in this case. The other pitfall here is that most tough problems in the programming space are relatively new. We don't have much experience with these problems, so it's difficult to account for or even predict the tough spots. Usually we have much more insight into problems after we've tried to solve it the wrong way a few times.

I think what typically ends up happening with this approach is that some areas get over designed, and others don't get the attention they deserve. This can lead to getting the abstractions wrong, and leaky abstractions can make bugs difficult to prevent. Not knowing what invariants need to hold for a system can be a big source of errors.

The extreme/agile approach looks at it from a different perspective. Instead of trying to imagine all the scenarios and details up front, let's just do what we can to discover them early in the process. To use a general oversimplification, it boils down to ignoring complexity in some areas in order to create a working prototype faster. The trick then is knowing which parts of the problem to ignore and which to focus on. Trying to tackle the hard parts is usually a solid strategy, except sometimes the hard parts aren't what we think they are.

Agile development also claims to engender quick changes. If the abstractions still fit, and we have a testing suite underneath us, then yes, the changes can be really fast. But if the abstractions fit, then any system is fast to change. When a requirement change breaks the current model though, or forces an interface change, then I would argue that making changes can be even slower, because they propagate out to more code. We not only have to refactor our logic, but the logic in all relevant tests as well. Some would argue that means the tests aren't written well or aren't testing at the right level. But writing tests at the right level is hard. It can take some time before we can learn how to write effective tests, just like it takes time before we can learn how to write effective code.

But the biggest concern I have with agile development is that it can be used as an excuse for poor judgement. Being lazy is not agile, it's being lazy. Always taking the easy way out usually catches up to us, and then it can be painful to dig out of that hole.

Here are my takeaways:

  1. If we don't understand the problem, we can't process our way out of it. In other words, if we don't know what the issues really are, no amount of process can help us.
  2. We learn best from our own mistakes. This informs our future decisions much more so than any process could.
  3. Context matters most.

Addition by Subtraction

December 06, 2009

Generally speaking, when we think of adding value, we usually think of what we can add to make improvements. But another way we can make improvements is to remove what's not essential. By trimming out what's not needed, focus is shifted back towards what's important. In some cases, these subtractions can actually add up to a whole lot.

This perspective is usually directed towards design. And for good reason. Simpler designs are friendlier, easier to use, and generally yield better outcomes. But we don't always share this perspective when thinking about software. Frameworks, apis, even programming languages themselves can all benefit from simpler perspectives. Strangely enough though, complexity here is often touted as an asset instead of a fault. In fact, the implicit reasoning is usually that things need to be complicated for them to be worthwhile.

Sometimes we even inflate simpler problems into harder ones. This serves as an excuse to implement more complex solutions. Unfortunately we can do this without even realizing it. And the more abstract the original problem is, the more room there is to introduce complexities. In some cases, it might be easier to err on the side of solving none of the abstract problems instead of trying to solve them all.

So if we find ourselves several layers removed from the original problem, it might be time to take a step back and evaluate how we got here. It might be the right path, or it might not. But if it's the wrong path, then we should turn around.