Here are notes as I read the wonderful and terrible monad paper by Phil Wadler. Wadler makes many good points on languages and Haskell in particular — enough to cause me to rekindle my interest in Haskell.

The opening sections introduce a simple tree walk problem and then proposes three different ways you might want to extend the program that would conventionally use mutation. He then shows how each of these can be done within the emergent monad pattern. Haskell compilers have come and gone since the paper was written and I collect here versions that work with the “Glasgow Haskell Compiler, Version 7.10.2” (“ghc” herein). Note that I do not approve of omitting trailing semicolons. Here are those results. I have written very little Haskell code and I am impressed with the quality of diagnostics I get from the compiler. Only once have I had one of these programs report a runtime error, and then it identified the culprit code. Otherwise if it compiles it gets the right answer.

I cannot fault the problem, the program or the three extensions. That the Haskell sampler code is readable to even an OCaml programmer is dubious however.

Simple code => 42
Variation one => "42; divide by zero"
Variation two => (42,2)
Variation three =>

eval (Con 1972) <= 1972
eval (Con 2) <= 2
eval (Div (Con 1972) (Con 2)) <= 986
eval (Con 23) <= 23
eval (Div (Div (Con 1972) (Con 2)) (Con 23)) <= 42

I have several things to say about these pieces of code.

Some may prefer this for variation one and yields “"Raise \"divide by zero\"; Return 42"”.

Variation two requires special attention. The counting here is done as (((((((0 + 1) + 1) + 1) + 1) + 1) + 1) + 1) whereas we could exploit associativity of + and do (((1 + 1) + 1) + ((1 + (1 + 1)) + 1)) as we follow the shape of the input tree, like this (=>(42,2)). This is a nice Haskell program but it does not fit the monad pattern introduced in this paper. This allows parallel computing as in variation three but some applications require the non-associative pattern. The sampler that Wadler gives us ignores associativity but allows application logic to control the order of additions. Sometime this serialization is logically necessary. This should not be done unless necessary but Wadler’s code shows us how to do it:

The third variation supports concurrent evaluation of the numerator and denominator if the compiler and hardware are willing and capable.

We collect the monads for comparison.

data M a = Raise Exception | Return a
type M a = State -> (a, State)
type M a = (Output, a)
In each case we have:
eval :: Term -> M Int
Note that in variation two when eval returns it is still a function that wants one more argument which is the State. Sort of cheaty but all’s fair in war and Haskell.

The intended audience for this paper is unclear but to follow the paper one must keep in mind that “a -> b -> c” is always to be read as “a -> (b -> c)” just because “f x y” is to be read as “(f x) y”. More notation is that “(*) a b” is equivalent to “a * b”.

Now I critically examine the sentence beginning “What sort of operations”. “M” is only metaphorically a type. More precisely it is informally a function (a ‘constructor’) that runs at compile time to map one type to another. But one learns by metaphors, so we proceed.

Now I work thru the critical expression “m*λa.n” for my sake and perhaps the sake of some readers of this note.

m*λa.n ≡ (*) m λa.n
from which we deduce that we expect m to be of type M a and λa.n to be of type a -> M b. (Don’t confuse the variable ‘a’ with the type of the same name.) Thus we can say a :: a. A table:
m       :: M a
a       :: a
n       :: M b
\a -> n :: a -> M b
(*)     :: M a -> ((a -> M b) -> M b)
(*) m   :: (a -> M b) -> M b
m*λa.n  :: Mb
This reminds me of balancing chemistry equations.

Regarding the parens in the third equation after the sentence with “easily rewritten”, I think that learning to grok the equation without parens is part of the monad initiation and I do not mean this in a pejorative sense.
eval (Div t u) = eval t * λa.eval u*λb.unit (a÷b)

This may be a faithful version of the code from section 2.6. Wadler seems to assume the constructor “I” as a primitive, perhaps so that he can later say “M = I” in characterizing the ‘identity monad’ but as far as I know such an equation is too meta for ghc. Beware the UTF-8 five pointed star (★); ghc does not approve of “*” in this role.

The code of 2.7 lives here and yields “"42; divide by zero"”, or here which yields “"Raise \"divide by zero\"; Return 42"”.

The code of 2.8 lives here and yields “(42,2)”. (OCaml & Scheme too)

The code of 2.9 lives here and yields:

eval (Con 1972) <= 1972
eval (Con 2) <= 2
eval (Div (Con 1972) (Con 2)) <= 986
eval (Con 23) <= 23
eval (Div (Div (Con 1972) (Con 2)) (Con 23)) <= 42

In the revisited version two there is the expression “let (b, z) = k a y in (b, z)” which means exactly what “k a y” means. I don’t know why Wadler uses the longer form. I doubt it was an oversight. Perhaps it better fits some pattern that I have not grasped.

We contrast the four definitions of the ★ operator now. In each case the type is:

(★) :: M a -> (a -> M b) -> M b
The respective definitions are:
a★k = k a
m★k = case m of Raise e -> Raise e; Return a -> k a
m★k = \x -> let (a, y) = m x in k a y
m★k = let (x, a) = m in let (y, b) = k a in (x ++ y, b)
These three definitions may cover most common monad patterns.

My Conclusions so far

Bill Frantz once said you can write Fortran in any language. I see now that you can write Fortran in Haskell. I guess that’s good. I think that the monad pattern forces you to write code that obscures some of the symmetry of the original problem and requires the compiler to be cleverer in finding parallelism. I see no way to avoid this.

Conforming to the monad pattern is error prone. As I adapted a Scheme program to this pattern, I had to resort to Scheme’s print statement.

See this too.
When I took physics I was impressed at the pattern of deriving, from simple fundamental equations, complex equations to predict some complex phenomenon. Then, using only math, one simplifies those equations, turns them into programs which give the right answer provided you did the math and code correctly. The intermediate expressions are often obscure or even opaque; intermediate formulae often have no physical interpretation that I can discover. Yet in the end your program predicts observed results. I find some Haskell programs like this. I have not learned the program transformations that correspond to those of physics.

Just now, a few days after writing the above, I was forced to do just such a mechanical transformation of physics equations and was aided by expressing the equations in functional form which I could code in Scheme and relate that Scheme code to the physics in my head. I then tediously transformed that Scheme code to a form that I could feed into the Wolfram indefinite integral server. I failed at the latter transformation when I tried to do it in my head. I left crumbs behind as I modified the definition of pqc here. Each line that now begins with a semicolon (comment) was tested against a non trivial test case during the transformation. More physics.

Haskell does not memoize for you. See Fibbobnaci which takes 111 ns per bottom call. I don’t know how it could have been feasibly otherwise. I can think of no general feasible memoizing plan.