Detecting use-cases for GADTs in OCaml

I often experience this when I’m learning more advanced features of programming languages and I’ve found that I personally find it easier to grasp language features when I have a clear understanding of what kinds of problems they’re meant to solve. One way to achieve this is start out by solving a specific problem without the language feature and then show how, when you add the feature, the solution becomes more elegant. I haven’t found any posts that do this with GADTs thus I set out to write this post. As I work through the example I will also try to point out symptoms in your code that might mean you’re better off modeling your problem using a GADT; this will hopefully help you find use-cases for GADTs in your own code-base.

In case you haven’t stumbled upon GADTs before reading this here is a very brief description of what they do; this is the most succinct description I’ve found and it was written by Ptival on reddit. I’ve modified it slightly, but it goes as follows:

In essence GADTs makes it possible to describe a richer relation between your data constructors and the types they inhabit. By doing so you give the compiler more information about your program and thus it’s able to type your programs more precisely.

Let that description dwell in the back of your mind as you read through the rest of the post.

The example that we’re going to use is similar to the canonical example that you can also find in the OCaml Users Guide section on GADTs. I later hope to write a blog post with lots of examples of GADTs to help solidify the concept - but that’s for another time.

The canonical example

The task is to write a small evaluator for a simple programming language. Now I know this might be quite different from the problems you would normally deal with at work but this is the canonical example for a good reason so bear with me.

The language is simple and only contains if-expressions, two operators (=, <) and two primitive types, int and boolean.

As I alluded to in the introduction we’ll first try to model this using a regular ADT²; Once we’ve seen the ADT implementation it’s easier to understand the benefits of using a GADT.

Using ADTs

type value =
  | Bool of bool
  | Int of int

type expr =
  | Value of value
  | If of expr * expr * expr
  | Eq of expr * expr
  | Lt of expr * expr

I’ve chosen to represent the abstract syntax tree using two variants: value which models the primitive values and expr that model the expressions.

Lets try and write a function that evaluates an expr into a value; this can be achieved with a straightforward recursive implementation.

let rec eval: expr -> value = function
  | Value v -> v
  | Lt (x, y) -> begin match eval x, eval y with
      | Int x, Int y -> Bool (x < y)
      | Int _, Bool _
      | Bool _, Int _
      | Bool _, Bool _ -> failwith "Invalid AST"
    end
  | If (b, l, r) -> begin match eval b with
      | Bool true -> eval l
      | Bool false -> eval r
      | Int _ -> failwith "Invalid AST"
    end
  | Eq (a, b) -> begin match eval a, eval b with
      | Int  x, Int  y -> Bool (x = y)
      | Bool _, Bool _
      | Bool _, Int  _
      | Int  _, Bool _ -> failwith "Invalid AST"
    end

An example of how to invoke this function is shown below.

eval (If ((Lt ((Value (Int 2)), (Value (Int 4)))),
          (Value (Int 42)),
          (Value (Int 0))))

- : value = VInt 42

An here’s what happens if we try to eval an invalid expr

eval (Eq ((Value (Int 42)), (Value (Bool false))))

Exception: Failure "Invalid AST".

Even though this implementation is correct (at least I hope it is) it leaves a lot to be desired. The first thing that springs to mind is that it’s possible to construct expr values that we can’t evaluate. This means we still need to cover these cases in our pattern matches in order for them to be exhaustive; we could use wildcard matches but then the compiler won’t be able to warn us about missing cases if we decide to add new data constructs later. Exhaustiveness checking is extremely helpful when working with larger code-bases so it shouldn’t be thrown away lightly. This gives us the first symptom you can look for when you’re considering use-cases for GADTs

symptom #1: When you need to to add extra cases for invalid states to make your pattern matches exhaustive

Now lets say we wanted to write an eval function that returned a proper OCaml int or bool rather than the wrapped value values.

To do this we would need to create a function for each primitive type, like so

let eval_int: value -> int = function
  | Int x -> x
  | Bool _ -> failwith "Got Bool, expected Int"

let eval_bool: value -> bool = function
  | Bool b -> b
  | Int _ -> failwith "Got Int, expected Bool"

Again, this solution works but it is rather unsatisfying to have boilerplate like that. It would be preferable if we could have a single function where its return type would depend on the input. This leads us to symptom #2:

symptom #2: You want the result of a function to depend on the data constructor used to create the data

With these two symptoms in mind lets see what the GADT implementation would look like.

Using GADT

Before we dive into the GADT implementation lets do a quick review of how to define GADTs in OCaml. Remember that we previously defined the value type like this

type value =
  | Bool of bool
  | Int of int

To define a GADT we need to use a slightly different syntax. The following syntax definition is taken from the OCaml Users Guide.

constr-decl ::= ...
              ∣ constr-name :  [ typexpr  { * typexpr } -> ]  typexpr

For the value type the GADT definition would look like this

type value =
  | Bool : bool -> value
  | Int : int -> value

Notice that we explicitly type the constructors. On its own this isn’t that exciting but it comes in handy when we introduce type parameters to the GADT as you will see shortly. Now lets give the full GADT implementation a go.

type _ value =
  | Bool : bool -> bool value
  | Int : int -> int value

type _ expr =
  | Value : 'a value -> 'a expr
  | If : bool expr * 'a expr * 'a expr -> 'a expr
  | Eq : 'a expr * 'a expr -> bool expr
  | Lt : int expr * int expr -> bool expr

We define value and expr GADTs which are parameterized with an anonymous type (notice the _) and each data constructor is explicitly typed with a type for this parameter, e.g. the Bool constructor takes a bool and gives you a bool typed value’.

The type parameter allows us to do two things:

We can associate a specific type with each data constructor, e.g. Bool is of type bool expr’.
We can restrict the input to functions using the type parameter of expr’, e.g. If requires that the first argument needs to be of type bool expr’.

Now that we’ve told the compiler about these restrictions lets see how the eval’ function turns out.

let rec eval : type a. a expr -> a = function
  | Value (Bool b) -> b
  | Value (Int i) -> i
  | If (b, l, r) -> if eval b then eval l else eval r
  | Eq (a, b) -> (eval a) = (eval b)
  | Lt (a,b) -> (eval a) < (eval b)

This is so wonderfully concise that the previously solution now looks horrific. Notice that this match is exhaustive and the return type is an actual ocaml primitive type. This is possible as OCaml now associates a specific type for the type parameter of each data constructor.

Lets give the eval function as go with an example

eval (If ((Eq ((Value (Int 2)), (Value (Int 2)))),
      (Value (Int 42)),
      (Value (Int 12))))

And an example where we return a bool rather than int

eval (If ((Eq ((Value (Int 2)), (Value (Int 2)))),
      (Value (Bool true)),
      (Value (Bool false))))

- : bool = true

Now if we give an invalid example as try, let’s see what happens

eval (If ((Value (Int 42)), (Value (Int 42)), (Value (Int 42))))

eval (If ((Value (Int 42)), (Value (Int 42)), (Value (Int 42))))
                  ^^^^^^
Error: This expression has type int value
       but an expression was expected of type bool value
       Type int is not compatible with type bool

Though it isn’t obvious from the output this is actually a compile-time error rather than the runtime error we got in the ADT example, that is, it is no longer possible to construct an invalid AST.

That’s it for this time. If you have any feedback catch me on twitter at @mads_hartmann.

Jan 20, 2018 Based on the feedback I’ve gotten over the years I’ve updated the code examples to make them more clear. Take that into consideration when you’re reading the comments below ;)

Footnotes

Generalized Algebraic Datatypes
Algebraic Datatypes