Free Monads Just Got Cheaper

One of the more exciting papers I have read recently is “Reflection without Remorse: Revealing a hidden sequence to speed up monad reflection” by Ploeg & Kiselyov. It was presented at ICFP by one of the authors and the video is here. The paper, hereafter referred to as RWR, provides fiendishly a clever way to speed up the performance of programs that are structured using monads. In Haskell, that means lots of programs. Half algorithm, half technique, the paper presents a number of applications to specific monads. I was particularly interested in the case of free monads, which is covered in this post.

Inefficiency By Association

So what’s the problem being addressed? Even if you have just a little familiarity with Haskell you might remember learning about performance issues related to the list append operation, ++, and be able to tell which side of the following expression is more efficient:

last ((xs ++ ys) ++ zz)) == last (xs ++ (ys ++ zs))

To see why the right hand side is faster, take a look at these definitions from the Haskell Prelude.

(++) :: [a] -> [a] -> [a]
[]     ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

last             :: [a] -> a
last [x]         =  x
last (_:xs)      =  last xs
last []          =  error "Prelude.last: empty list"

In order to get the last element of the list, last needs to traverse the entire list. That’s not the issue in question, but it does mean that all references to the recursively defined ++ must be dispensed with before last starts evaluating. In other words, we must completely build the list during the computation. Looking at the second definition of ++, you can see that the right hand input is left alone, but the left hand list is traversed. That means xs in the example will be traversed twice in the left hand side but only once on the right.

The list expressions in the example correspond to little trees built according to where we place the parentheses. The associative law lets us move parens around or even erase them, so that we can just write

xs ++ ys ++ zs

without even thinking about which little tree is actually used to build the list. The trouble is that some operations prefer the list to be built one way or vice versa. One bit of intuition is that the RWR solution takes the construction above seriously. It looks like a sequence with ++ as a “constructor”, and there are some fast data structures to build sequences that we don’t use enough in Haskell.

This performance question is not confined to lists and the ++ operation alone. The same issue crops up in other data structures like trees and monads. Known optimizations and workarounds can help speed up some applications, but often at the cost of slowing down others. Lists, trees and monads all share a similar structure. In fact, it’s all the same structure—lists are an extreme form of tree and monads are really a tree of computations, a point nicely made in a sigfpe blog post:

Instances of type class Monad can be thought of as trees describing ‘computations’. The Monad interface provides a way to graft subtrees into trees. There are as many types of ‘computation’ as there are interpreters for tree structures. In practice the interpretation is interleaved with the graft operation so that we don’t have separate tree-building and interpretation phases.

The grafting mentioned requires traversals just like in the simple example above, and it can lead to inefficiencies if they are nested the “wrong” way. Same as before, the meaning of the computation is not affected since the same tree can be built in multiple ways, thanks to the monadic version of the associative law:

(m >>= f) >>= g  ==  m >>= (\x -> f x >>= g)

The analogy between ++ and >>= is hopefully clear even though an abstraction (that lambda) is introduced.

So, what does the RWR solution do about all these equivalent trees? It took me a couple of times reading the paper for the solution to start to sink in. Feeling rather dense, I remembered this:

“In mathematics you don’t understand things. You just get used to them.” —John von Neumann

Even better, this is software, where we can edit the motto: “In computer science you don’t need to understand things, you just have to use them”. It turns out that can help us understand things too.

Free, as in Monads

Let’s go straight to the api for free monads.

Free monads are interesting as one way to create embedded domain specific languages (DSLs) in Haskell. There are several approaches ranging from deep embedding to shallow embedding. A free monad comes from the deep end of the embedding pool.

Deep embedding has the nice feature that you can implement interpreters that do all kinds of different things with the same program. You can have an interpreter that can evaluate your program, a second one that can pretty print it, and a third interpreter that translates the very same program to run on a cluster. Sexy, no?

Example DSL

We will look at a toy example DSL adapted from John Wiegley’s blog. If you have never seen the technique, go read that short piece or this one. Or, make do with the quick summary below.

To build your DSL, you start with a data type with multiple constructors. Each constructor represents a valid command in your DSL. Next, add a type parameter to each constructor for any command that can continue execution. Make the new type an instance of Functor. (The compiler can even do that for you automatically.) Finally, make the functor a free monad instance. This last step lets us use do-notation, and we can then structure our programs more like normal Haskell programs. We can optionally add functions that hide some of the plumbing, to make the programs look even nicer.

That plan plays out in the example below, a tiny imperative DSL that gives us an imperative robot control language with just three commands: go left, go right, and stop (forever). They appear as the constructors of FDirective.

import qualified Control.Monad.Free as F

-- Commands, for
data FDirective next = FL next | FR next | FS
  deriving (Functor, Show)

With just one more line we can make a free monad out of our data type, and then add a little syntactic sugar (left', right', shutdown') so our program will look nice.

-- Traditional Free Monad
type TFM a = F.Free FDirective a

left'     = F.liftF (FL ())
right'    = F.liftF (FR ())
shutdown' = F.liftF FS

We can already write an evaluation interpreter

interpret' :: TFM a -> IO ()
interpret' (F.Free (FL f)) = putStrLn "Going left"  >> interpret' f
interpret' (F.Free (FR f)) = putStrLn "Going right" >> interpret' f
interpret' (F.Free FS)     = putStrLn "Saw shutdown, stopping"
interpret' (F.Pure _)      = error "Improper termination"

And we can now write a small program.

instrs' :: F.Free FDirective a
instrs' = do
  left'
  right'
  left'
  shutdown'

How much work is it to apply the RWR solution? Not that much. I modified the code from the example above and followed the suggestion of using the LANGUAGE ViewPatterns pragma to make the code even easier to look at. Here’s the result.

type FM a = FreeMonad FDirective a

interpret :: FM a -> IO ()
interpret (toView -> Impure (FL f)) =
  putStrLn "Going left" >> (interpret f)
interpret (toView -> Impure (FR f)) =
  putStrLn "Going right" >> (interpret f)
interpret (toView -> Impure FS) = putStrLn "Saw shutdown, stopping"
interpret (toView -> Pure _ )  = error "Improper termination"

left :: FreeMonad FDirective ()
left = liftF $ FL () -- same as `fromView $ Impure FR``
right = liftF $ FR ()
shutdown = liftF $ FS

liftF :: Functor f => f a -> FreeMonad f a
liftF x = fromView $ Impure (fmap return x)

instrs :: FM ()
instrs =
 do left
    right
    left
    shutdown

main :: IO ()
main = interpret instrs

The output from either example is the same of course

λ> main
Going left
Going right
Going left
Saw shutdown, stopping

In the new version FreeMonad corresponds to F.Free from the first version. The only significant difference in the code is the appearance of the toView and fromView functions. The toView function is needed whenever your interpreter needs to pattern match on a command and the fromView is effectively just a constructor. That’s it!

Under the Covers

Under the covers, the RWR approach to monads is doing a lot. It factors the monad into a pair of mutually recursive data structures: a view and a builder. The builder is a sequence of views, and the view is a tree of builders. The view gives you partial access to the entire computation. If you need to pattern match part of the computation you must use the view. Part of the magic is contained in the data structure used for the sequence, which is the same for all monads, and the rest of the magic is in the conversion from builder to view, which needs to be figured out for each monad. That’s why RWR is half algorithm, half technique. Of course for free monads, the solution has been worked out for every free monad. Very handy.

Let’s take a peek under the covers at the library code. The relevant file is here and the key section appears below.

type TCQueue = FastTCQueue

newtype FC f a b = FC (a -> FreeMonad f b)

type FMExp f a b = TCQueue (FC f) a b

data FreeMonad f a =
   forall x. FM (FreeMonadView f x) (FMExp f x a)

data FreeMonadView f a 	= Pure a | Impure (f (FreeMonad f a))

As mentioned, a free monad value consists of a pair of calculations. The first one is a FreeMonadView that contains the part of the calculation in a form you might expect to see from the traditional free monad. The second part is a sequence of monadic functions (FMExp) that eventually get composed to make up the rest of the calculation. The sequence is implemented as a data structure with fast updates to either end of the sequence—one that has been generalized to handle elements of different types as can happen with all the statements in a DSL. Did I mention it was clever?

Enough peeking, we have morals, right? So let’s get back above the covers. It’s simple enough to try this on some other DSLs and compare performance, which I hope to do soon.

Getting the Code

The RWR paper offers code with all the monad solutions from the paper and more here. There’s no corresponding hackage library that I could find. Given the pace of development on hackage these days I could be wrong, or soon will be.

So, I forked the code, cabalized it and fixed a couple of things that were needed to make it compile. That code is here. The complete code for the toy DSL example is contained in this gist.

Tim Sears September 9, 2014
Subscribe: Atom Feed