Implementing Co, a Small Language With Coroutines #1: The Parser

April 24, 2021
A twenty-five minute read
9 comments
3 🗣️ 19 ❤️ 4 🔁
Tags: programmingprogramming-languagesparsinghaskell

Many major programming languages these days support some lightweight concurrency primitives. The most recent popular ones are Goroutines in Go, Coroutines in Kotlin and Async in Rust. Let’s explore some of these concepts in detail by implementing a programming language with support for coroutines and Go-style channels.

This is the first post in a series of posts:

Lightweight Concurrency

Lightweight concurrency has been a popular topic among programmers and programming language designers alike in recent times. Many languages created in the last decade have support for them either natively or using libraries. Some example are:

These examples differ in their implementation details but all of them enable programmers to run millions of tasks concurrently. This capability of being able to do multiple tasks at the same time is called Multitasking.

Multitasking can be of two types:

Pre-emptive multitasking in which the tasks can be preempted so that other tasks can be run.
Cooperative multitasking^@1 in which the tasks voluntarily yield control to other tasks to be run.

Coroutines^@2 are computations that support cooperative multitasking¹. Unlike ordinary Subroutines that execute from start to end and do not hold any state between invocations, coroutines can exit in the middle by calling other coroutines and may later resume at the same point. They also hold state between invocations. They do so by yielding the control of the current running thread so that some other coroutine can be run on the same thread².

Coroutine implementations often come with support for Channels for inter-coroutine communication. One coroutine can send a message over a channel, and another coroutine can receive the message from the same channel. Coroutines and channels together are an implementation of Communicating Sequential Processes (CSP)^@5, a formal language for describing patterns of interaction in concurrent systems.

In this series of posts, we implement Co, a small dynamically typed imperative programming language with support for coroutines and channels. Haskell is our choice of language to write an interpreter for Co.

Introducing Co

Co has these basic features that are found in many programming languages:

Dynamic and strong typing.
Null, boolean, string and integer literals, and values.
Addition, subtraction, multiplication and integer division arithmetic operations.
String concatenation operation.
Equality and inequality checks on booleans, strings and numbers.
Less-than and greater-than comparison operations on numbers.
Variable declarations, usage and assignments.
if and while statements.
Function declarations and calls, with support for recursion.
First class functions and anonymous functions.
Mutable closures.

It also has these special features:

yield statement to yield the current thread of computation (ToC).
spawn statement to start a new ToC.
First class channels with operators to send and receive values over them.
sleep function to sleep the current ToC for a given number of milliseconds.

Let’s see some example code in Co for illustration:

// Fibonacci numbers
// using a while loop
var a = 0;
var b = 1;
var j = 0;
var temp = null;
while (j < 6) {
  print(a);
  temp = a;
  a = b;
  b = temp + b;
  j = j + 1;
}

// Fibonacci numbers
// using recursive function call
function fib(n) {
  if (n < 2) {
    return n;
  }
  return fib(n - 2)
    + fib(n - 1);
}

var i = 0;
while (i < 6) {
  print(fib(i));
  i = i + 1;
}

As you may notice, Co’s syntax is heavily inspired by JavaScript. The code example above computes and prints³ the first six Fibonacci numbers in two different ways, and demonstrates a number of features of Co, including variable declarations and assignments, while loops, if conditions, and function declarations and calls along with recursion.

We can save the code in a file and run it with the Co interpreter⁴ to print this (correct) output:

The next example shows the usage of coroutines in Co:

function printNums(start, end) {
  var i = start;
  while (i < end + 1) {
    print(i);
    yield;
    i = i + 1;
  }
}

spawn printNums(1, 4);
printNums(11, 16);

Running this code with the interpreter prints this output:

The printNum function prints numbers between the start and end arguments, but yields the ToC after each print. Notice how the prints are interleaved. This is because the two calls to the function printNums run concurrently in two separate coroutines.

The next example show the usage of channels in Co:

var chan = newChannel();

function player(name) {
  var n = null;
  while (true) {
    n = <- chan;
    if (n == "done") {
      print(name + " done");
      return;
    }

    print(name + " " + n);
    if (n == 0) {
      print(name + " done");
      "done" -> chan;
      return;
    }
    n - 1 -> chan;
  }
}

spawn player("ping");
spawn player("pong");
10 -> chan;

This is the popular Ping-pong benchmark for communication between ToCs. Here we use channels and coroutines for the same. Running this code prints this output:

ping 10
pong 9
ping 8
pong 7
ping 6
pong 5
ping 4
pong 3
ping 2
pong 1
ping 0
ping done
pong done

Lastly, here is Sleep sort in Co:

function sleepSort(a, b, c, d, e) {
  function printNum(num) {
    sleep(num);
    print(num);
  }
  spawn printNum(a);
  spawn printNum(b);
  spawn printNum(c);
  spawn printNum(d);
  spawn printNum(e);
}

sleepSort(5, 4, 3, 2, 1);

Running this code prints this output:

We’ll revisit these examples later while building our interpreter, and understand them as well as the code that implements them.

The Co Interpreter

The Co interpreter works in two stages:

Parsing: a parser converts Co source code to Abstract Syntax Tree (AST).
Interpretation: a tree-walking interpreter walks the AST, executes the instructions and produces the output.

Stages of the Co interpreter

In this post, we implement the parser for Co. In the second part, we create a first cut of the interpreter that supports the basic features of Co. In the third and fourth parts, we extend the interpreter to add support for coroutines and channels.

The complete code for the parser is here. You can load it in GHCi using stack (by running stack co-interpreter.hs), and follow along while reading this article⁵.

Let’s start with listing the extensions and imports needed⁶.

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE RecordWildCards #-}
module CoInterpreter where

import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar.Lifted
import Control.Monad (foldM, unless, void, when)
import Control.Monad.Base (MonadBase, liftBase)
import Control.Monad.Combinators.Expr (Operator (..), makeExprParser)
import Control.Monad.Cont (ContT, MonadCont, callCC, runContT)
import Control.Monad.Except (ExceptT, MonadError (..), runExceptT)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Control.Monad.State.Strict (MonadState, StateT, evalStateT)
import qualified Control.Monad.State.Strict as State
import Data.Foldable (for_, traverse_)
import Data.IORef.Lifted
import qualified Data.Map.Strict as Map
import Data.Maybe (fromMaybe)
import qualified Data.PQueue.Prio.Min as PQ
import Data.Time.Clock.POSIX (getPOSIXTime)
import Data.Void (Void)
import System.Clock (Clock (Monotonic), fromNanoSecs, getTime, TimeSpec)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
import Text.Megaparsec hiding (runParser)
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
import Text.Pretty.Simple
  ( CheckColorTty (..),
    OutputOptions (..),
    defaultOutputOptionsNoColor,
    pPrintOpt,
  )

Next, let’s take a look at the Co AST.

The Co AST

Since Co is an imperative programming language, it is naturally Statement oriented. A statement describes how an action is to be executed. A Co program is a list of top-level statements.

Statements have internal components called Expressions. Expressions evaluate to values at program run time. Let’s look at them first.

Expressions

We represent Co expressions as a Haskell Algebraic data type (ADT) with one constructor per expression type:

data Expr
  = LNull
  | LBool Bool
  | LStr String
  | LNum Integer
  | Variable Identifier
  | Binary BinOp Expr Expr
  | Call Expr [Expr]
  | Lambda [Identifier] [Stmt]
  | Receive Expr
  deriving (Show, Eq)

type Identifier = String

LNull: The literal null. Evaluates to the null value.
LBool Bool: The boolean literals, true and false. Evaluate to their counterpart boolean values.
LStr String: A string literal like "towel". Evaluates to a string.
LNum Integer: An integer literal like 1 or -5. Evaluates to an integer.
Variable Identifier: A variable named by an identifier like a1 or sender. Evaluates to the variable’s value at the point in the execution of code. An Identifier is a string of alphanumeric characters, starting with an alpha character.
Binary Op Expr Expr: A binary operation on two expressions. Example: 1 + 41 or 2 == "2". Supported binary operations are defined by the BinOp enum: addition/concatenation, subtraction, multiplication, integer division, equality and inequality checks, and less-than and greater-than comparisons.

data BinOp =
    Plus
  | Minus
  | Slash
  | Star
  | Equals
  | NotEquals
  | LessThan
  | GreaterThan
  deriving (Show, Eq)

Call Expr [Expr]

Calls the function obtained by evaluating the callee expression, with the given argument expressions. Examples: calcDistance(start, end) or getResolver(context)(template).

Lambda [Identifier] [Stmt]

A function with the given parameter names and body statements. Example: function (a, b) { return a*a + b*b; }.

Receive Expr

Receives a value from a channel. Examples:

// receive a value from the channel and prints it
print(<- someChannel);
// receive a value from the channel and assigns it
var x = <- someChannel;

Next, we see how statements are represented in the AST.

Statements

We represent the Co statements as a Haskell ADT with one constructor per statement type:

data Stmt
  = ExprStmt Expr
  | VarStmt Identifier Expr
  | AssignStmt Identifier Expr
  | IfStmt Expr [Stmt]
  | WhileStmt Expr [Stmt]
  | FunctionStmt Identifier [Identifier] [Stmt]
  | ReturnStmt (Maybe Expr)
  | YieldStmt
  | SpawnStmt Expr
  | SendStmt Expr Expr
  deriving (Show, Eq)

type Program = [Stmt]

ExprStmt Expr

A statement with just an expression. The expression’s value is thrown away after executing the statement. Example: 1 + 2;

VarStmt Identifier Expr

Defines a new variable named by an identifier and sets it to an expression’s value. In Co, variables must be initialized when being defined. Example: var a = 5;

AssignStmt Identifier Expr

Sets an already defined variable named by an identifier to an expression’s value. Example: a = 55 + "hello";

IfStmt Expr [Stmt]

Executes a list of statements if the condition expression evaluates to a truthy value. In Co, null and false values are non-truthy, and every other value is truthy. Also note that, there are no else branches for if statements in Co. Example:

if (a == 1) {
  print("hello");
}

WhileStmt Expr [Stmt]

Executes a list of statements repeatedly while the condition expression evaluates to a truthy value. Example:

var n = 0;
while (n < 5) {
  print(n);
  n = n + 1;
}

FunctionStmt Identifier [Identifier] [Stmt]

Defines a function with a name, a list of parameter names, and a list of body statements. Example:

function greet(greeting) {
  print(greeting + " world");
}

ReturnStmt (Maybe Expr)

Returns from a function, optionally returning an expression’s value. Example:

function square(x) {
  return x * x;
}

YieldStmt

Suspends the currently executing ToC so that some other ToC may run. The current ToC resumes later from statement next to the yield statement. Example:

function printNums(start, end) {
  var i = start;
  while (i < end + 1) {
    print(i);
    yield;
    i = i + 1;
  }
}

SpawnStmt Expr

Starts a new ToC in which the given expression is evaluated, which runs concurrently with all other running ToCs. Example:

spawn printNums(1, 4);
printNums(11, 16);

SendStmt Expr Expr

Sends an expressions’s value to a channel. Example:

1 + square(3) -> someChannel;

The Co AST is minimal but it serves our purpose. Next, let’s figure out how to actually parse source code to AST.

Parsing

Parsing is the process of taking textual input data and converting it to a data structure—often a hierarchal structure like AST—while checking the input for correct syntax. There are many ways of writing parsers⁷, Parser Combinators^@11 being one of them. Parser combinators libraries are popular in Haskell because of their ease of use and succinctness. We are going to use one such library, Megaparsec, to create a parser for Co.

Let’s start with writing some basic parsers for the Co syntax using the Megaparsec parsers.

type Parser = Parsec Void String

sc :: Parser ()
sc = L.space space1 lineCmnt blockCmnt
  where
    lineCmnt = L.skipLineComment "//"
    blockCmnt = L.skipBlockComment "/*" "*/"

lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc

symbol :: String -> Parser String
symbol = L.symbol sc

reserved :: String -> Parser ()
reserved w = (lexeme . try) $ string w *> notFollowedBy alphaNumChar

parens, braces :: Parser a -> Parser a
parens = between (symbol "(") (symbol ")")
braces = between (symbol "{") (symbol "}")

semi, identifier, stringLiteral :: Parser String
semi = symbol ";"
identifier = lexeme ((:) <$> letterChar <*> many alphaNumChar)
stringLiteral = char '"' >> manyTill L.charLiteral (char '"') <* sc

integer :: Parser Integer
integer = lexeme (L.signed sc L.decimal)

In the above code, Parser is a concise type-alias for our parser type⁸.

The sc parser is the space consumer parser. It dictates what’s ignored while parsing the sources code. For Co, we consider the Unicode space character and the control characters—tab, newline, carriage return, form feed, and vertical tab—as whitespaces. We use the space1 parser to configure that. sc also lets us configure how to ignore line comments and block comments while parsing.

The lexeme combinator is for parsing Lexemes while ignoring whitespaces and comments. It is implemented using the lexeme combinator from Megaparsec.

The symbol combinator is for parsing symbols, that is, operators like * and ;. It is implemented using the symbol combinator.

The reserved combinator is for parsing reserved keywords like if and while. It is implemented as a combination of lexeme combinator and string parser, while making sure that the reserved keyword is not a prefix of another word. It uses the try combinator to backtrack if that happens.

The parens and braces combinators are for parsing code surrounded by parentheses ( and ) and braces { and } respectively.

The semi parser matches a semicolon ;. The identifier parser is for parsing identifiers in Co. The stringLiteral parser matches a string literal like "polyphonic ringtone". integer is the parser for Co integers.

Let’s also write some functions to run the parsers and pretty-print the output in GHCi:

runParser :: Parser a -> String -> Either String a
runParser parser code = do
  case parse parser "" code of
    Left err -> Left $ errorBundlePretty err
    Right prog -> Right prog

pPrint :: (MonadIO m, Show a) => a -> m ()
pPrint =
  pPrintOpt CheckColorTty $
    defaultOutputOptionsNoColor
      { outputOptionsIndentAmount = 2,
        outputOptionsCompact = True,
        outputOptionsCompactParens = True
      }

That completes our basic setup for parsing. Let’s try them out in GHCi now:

> runParser identifier "num1 "
Right "num1"
> runParser stringLiteral "\"val\"  "
Right "val"
> runParser integer "1  "
Right 1
> runParser integer "-12  "
Right (-12)

They work as expected. Next, off to parsing Co expressions.

Parsing Expressions

Parsing Co expressions to AST requires us to handle the Associativity and Precedence of the operators. Fortunately, Megaparsec makes it easy with the makeExprParser combinator. makeExprParser takes a parser to parse the terms and a table of operators, and creates the expression parser for us.

Terms are parts of expressions which cannot be broken down further into sub-expressions. Examples in Co are literals, variables, groupings and function calls.
The table of operators is a list of Operator Parser Expr lists ordered in descending precedence. All operators in one list have the same precedence but may have different associativity.

This is a lot to take in but looking at the code makes it clear. First, the operator table:

operators :: [[Operator Parser Expr]]
operators =
  [ [Prefix $ Receive <$ symbol "<-"],
    [ binary Slash $ symbol "/",
      binary Star $ symbol "*"],
    [ binary Plus $ symbol "+",
      binary Minus $ try (symbol "-" <* notFollowedBy (char '>'))
    ],
    [ binary LessThan $ symbol "<",
      binary GreaterThan $ symbol ">"
    ],
    [ binary Equals $ symbol "==",
      binary NotEquals $ symbol "!="
    ]
  ]
  where
    binary op symP = InfixL $ Binary op <$ symP

The prefix operator <-, for receiving values from channels, is of highest precedence and hence the first in the table. Next are the binary operators / and * for integer division and multiplication respectively. Following it, are the binary + and - operators which are of same precedence. After that come the comparison operators < and >. Finally, we have the lowest precedence operators, the equality and inequality checks == and !=.

Each operator also contains the parser to parse the operator symbol in the source code. All of them are self-explanatory except the parser for the - operator. The - parser is slightly complicated because we want it to not parse ->, the channel send operator, the first character of which is same as the symbol for the - operator.

Moving on to the term parser next:

term :: Parser Expr
term = primary >>= call
  where
    call e =
      ( lookAhead (symbol "(")
        >> symbol "("
        >> Call e <$> sepBy expr (symbol ",") <* symbol ")"
        >>= call )
      <|> pure e

    primary = LNull <$ reserved "null"
          <|> LBool True <$ reserved "true"
          <|> LBool False <$ reserved "false"
          <|> LStr <$> stringLiteral
          <|> LNum <$> integer
          <|> Lambda
            <$> (reserved "function" *> parens (sepBy identifier $ symbol ","))
            <*> braces (many stmt)
          <|> Variable <$> identifier
          <|> parens expr

The term parser uses the primary parser to parse the primary terms, and the call parser to parse the function calls. The call parser is recursive because the callee itself can be a chain of more functions calls, for example x(1, 2)(y)("blah")(). It starts with the output of the primary parser, and then tries to parse a function call. If it succeeds, it calls itself again to parse the next function call in the chain. If it fails, it returns the output of the previous round⁹.

The primary parser is a combination of smaller parsers—one for each type of primary terms in Co—combined together using the Alternative instance of the parsers. It matches each parser one by one from the top until the match succeeds¹⁰. First, it tries to match for literals null, true, and false, failing which it matches for string and integer literals. Then it matches for anonymous functions, variables, and expressions grouped in parentheses—in that order¹¹.

That’s it! We finally write the expression parser:

expr :: Parser Expr
expr = makeExprParser term operators

Let’s play with it in GHCi:

> pPrint $ runParser expr "1 + a < 9 - <- chan"
Right
  ( Binary LessThan
    ( Binary Plus ( LNum 1 ) ( Variable "a" ) )
    ( Binary Minus ( LNum 9 ) ( Receive ( Variable "chan" ) ) ) )
> pPrint $ runParser expr "funFun(null == \"ss\" + 12, true)"
Right
  ( Call "funFun"
    [ Binary Equals LNull
      ( Binary Plus ( LStr "ss" ) ( LNum 12 ) )
    , LBool True ] )
> pPrint $ runParser expr "-99 - <- chan + funkyFun(a, false, \"hey\")"
Right
  ( Binary Plus
    ( Binary Minus ( LNum ( - 99 ) ) ( Receive ( Variable "chan" ) ) )
    ( Call "funkyFun" [ Variable "a", LBool False, LStr "hey" ] ) )
> pPrint $ runParser expr "(function (x) { print(x + 1); })(99)"
Right
  ( Call
    ( Lambda [ "x" ]
      [ ExprStmt
        ( Call
          ( Variable "print" )
          [ Binary Plus ( Variable "x" ) ( LNum 1 ) ] ) ] )
    [ LNum 99 ] )

Done! Onward, to parsing Co statements.

Parsing Statements

For parsing statements, we reuse the same trick we used for parsing expressions: combine smaller parsers for each statement type using Alternative.

stmt :: Parser Stmt
stmt =
  IfStmt <$> (reserved "if" *> parens expr) <*> braces (many stmt)
    <|> WhileStmt <$> (reserved "while" *> parens expr) <*> braces (many stmt)
    <|> VarStmt <$> (reserved "var" *> identifier) <*> (symbol "=" *> expr <* semi)
    <|> YieldStmt <$ (reserved "yield" <* semi)
    <|> SpawnStmt <$> (reserved "spawn" *> expr <* semi)
    <|> ReturnStmt <$> (reserved "return" *> optional expr <* semi)
    <|> FunctionStmt
      <$> try (reserved "function" *> identifier)
      <*> parens (sepBy identifier $ symbol ",")
      <*> braces (many stmt)
    <|> try (AssignStmt <$> identifier <*> (symbol "=" *> expr <* semi))
    <|> try (SendStmt <$> expr <*> (symbol "->" *> expr <* semi))
    <|> ExprStmt <$> expr <* semi

Most of the statements start with keywords like if, var, function, etc, so our parsers look for the keywords first using the reserved parser. If none of the keyword-starting statements match then we try matching for assignments, channel sends, and expression statements, in that order. The stmt parser uses the expr parser to parse expressions within statements. It also uses the combinators many, optional, try and sepBy from Megaparsec.

Finally, we put everything together to create the parser for Co:

program :: Parser Program
program = sc *> many stmt <* eof

The Co parser matches multiple top-level statements, starting with optional whitespace and ending with end-of-file.

It’s demo time. In GHCi, we parse a file and pretty-print the AST:

> readFile "coroutines.co" >>= pPrint . runParser program
Right
  [ FunctionStmt "printNums"
    [ "start", "end" ]
    [ VarStmt "i"
      ( Variable "start" )
    , WhileStmt
      ( Binary LessThan
        ( Variable "i" )
        ( Binary Plus ( Variable "end" ) ( LNum 1 ) ) )
      [ ExprStmt
        ( Call "print" [ Variable "i" ] )
      , YieldStmt
      , AssignStmt "i"
        ( Binary Plus ( Variable "i" ) ( LNum 1 ) ) ] ]
  , SpawnStmt
    ( Call "printNums" [ LNum 1, LNum 4 ] )
  , ExprStmt
    ( Call "printNums" [ LNum 11, LNum 16 ] ) ]

I’ve gone ahead and created a side-by-side view of the source code and corresponding AST for all three input files, aligned neatly for your pleasure of comparison. If you so wish, feast your eyes on them, and check for yourself that everything works correctly.

Source code and AST for fib.co

// Fibonacci numbers
// using a while loop
var a = 0;
var b = 1;
var j = 0;
var temp = null;
while (j < 6) {
  print(a);
  temp = a;
  a = b;
  b = temp + b;
  j = j + 1;
}

// Fibonacci numbers
// using recursive function call
function fib(n) {
  if (n < 2) {
    return n;
  }
  return fib(n - 2)
    + fib(n - 1);
}

var i = 0;
while (i < 6) {
  print(fib(i));
  i = i + 1;
}



[ VarStmt "a" ( LNum 0 )
, VarStmt "b" ( LNum 1 )
, VarStmt "j" ( LNum 0 )
, VarStmt "temp" LNull
, WhileStmt ( Binary LessThan ( Variable "j" ) ( LNum 6 ) )
  [ ExprStmt ( Call "print" [ Variable "a" ] )
  , AssignStmt "temp" ( Variable "a" )
  , AssignStmt "a" ( Variable "b" )
  , AssignStmt "b" ( Binary Plus ( Variable "temp" ) ( Variable "b" ) )
  , AssignStmt "j" ( Binary Plus ( Variable "j" ) ( LNum 1 ) )
  ]



, FunctionStmt "fib" [ "n" ]
  [ IfStmt ( Binary LessThan ( Variable "n" ) ( LNum 2 ) )
    [ ReturnStmt ( Just ( Variable "n" ) )
    ]
  , ReturnStmt ( Just ( Binary Plus ( Call "fib" [ Binary Minus ( Variable "n" ) ( LNum 2 ) ] )
                                    ( Call "fib" [ Binary Minus ( Variable "n" ) ( LNum 1 ) ] ) ) )
  ]

, VarStmt "i" ( LNum 0 )
, WhileStmt ( Binary LessThan ( Variable "i" ) ( LNum 6 ) )
  [ ExprStmt ( Call "print" [ Call "fib" [ Variable "i" ] ] )
  , AssignStmt "i" ( Binary Plus ( Variable "i" ) ( LNum 1 ) )
  ] ]

Source code and AST for coroutines.co

function printNums(start, end) {
  var i = start;
  while (i < end + 1) {
    print(i);
    yield;
    i = i + 1;
  }
}

spawn printNums(1, 4);
printNums(11, 16);

[ FunctionStmt "printNums" [ "start", "end" ]
  [ VarStmt "i" ( Variable "start" )
  , WhileStmt ( Binary LessThan ( Variable "i" ) ( Binary Plus ( Variable "end" ) ( LNum 1 ) )
    [ ExprStmt ( Call "print" [ Variable "i" ] )
    , YieldStmt
    , AssignStmt "i" ( Binary Plus ( Variable "i" ) ( LNum 1 ) )
    ]
  ]

, SpawnStmt ( Call "printNums" [ LNum 1, LNum 4 ] )
, ExprStmt ( Call "printNums" [ LNum 11, LNum 16 ] ) ]

Source code and AST for pingpong.co

var chan = newChannel();

function player(name) {
  var n = null;
  while (true) {
    n = <- chan;
    if (n == "done") {
      print(name + " done");
      return;
    }

    print(name + " " + n);
    if (n == 0) {
      print(name + " done");
      "done" -> chan;
      return;
    }
    n - 1 -> chan;
  }
}

spawn player("ping");
spawn player("pong");
10 -> chan;

[ VarStmt "chan" ( Call "newChannel" [] )

, FunctionStmt "player" [ "name" ]
  [ VarStmt "n" LNull
  , WhileStmt ( LBool True )
    [ AssignStmt "n" ( Receive ( Variable "chan" ) )
    , IfStmt ( Binary Equals ( Variable "n" ) ( LStr "done" ) )
      [ ExprStmt ( Call "print" [ Binary Plus ( Variable "name" ) ( LStr " done" ) ] )
      , ReturnStmt Nothing
      ]

    , ExprStmt ( Call "print" [ Binary Plus ( Binary Plus ( Variable "name" ) ( LStr " " ) ) ( Variable "n" ) ] )
    , IfStmt ( Binary Equals ( Variable "n" ) ( LNum 0 ) )
      [ ExprStmt ( Call "print" [ Binary Plus ( Variable "name" ) ( LStr " done" ) ] )
      , SendStmt ( LStr "done" ) "chan"
      , ReturnStmt Nothing
      ]
    , SendStmt ( Binary Minus ( Variable "n" ) ( LNum 1 ) ) "chan"
    ]
  ]

, SpawnStmt ( Call "player" [ LStr "ping" ] )
, SpawnStmt ( Call "player" [ LStr "pong" ] )
, SendStmt ( LNum 10 ) "chan" ]

That’s all for now. In about 200 lines of code, we have implemented the complete parser for Co. In the next part, we’ll implement a tree-walking interpreter for the Co AST that supports the basic features.

The full code for the parser can be seen here.

Acknowledgements

Many thanks to Steven Deobald for reviewing a draft of this article.

Bartel, Joe. “Non-Preemptive Multitasking.” The Computer Journal, no. 30 (May 2011): 37–38, 28. http://cini.classiccmp.org/pdf/HT68K/HT68K%20TCJ30p37.pdf.

Hoare, C A R. Communicating Sequential Processes. Prentice Hall, 1986. https://doi.org/10.1145/359576.359585.

Hutton, Graham. “Higher-Order Functions for Parsing.” Journal of Functional Programming 2, no. 3 (1992): 323–43. https://doi.org/10.1017/S0956796800000411.

Knuth, Donald E. “Coroutines.” In The Art of Computer Programming: Volume 1: Fundamental Algorithms, 3rd ed., 193–200. Addison Wesley, 1997.

Watson, Des. “Approaches to Syntax Analysis.” In A Practical Approach to Compiler Construction, 75–93. Springer International Publishing, 2017. https://doi.org/10.1007/978-3-319-52789-5_4.

Coroutines are often conflated with Fibers. Some implementations of coroutines call themselves fibers instead. Quoting Wikipedia:

Fibers describe essentially the same concept as coroutines. The distinction, if there is any, is that coroutines are a language-level construct, a form of control flow, while fibers are a systems-level construct, viewed as threads that happen to not run in parallel. It is contentious which of the two concepts has priority: fibers may be viewed as an implementation of coroutines, or as a substrate on which to implement coroutines.

↩︎
Green threads are another popular solution for lightweight concurrency. Unlike coroutines, green threads are preemtable. Unlike Threads, they are scheduled by the language runtime instead of the operating system.↩︎
print is a built-in function in Co. It behaves same as the console.log function in JavaScript, or the print function in Haskell.↩︎
The complete code for the interpreter is here.↩︎

If you are a Cabal user instead, the process is a little longer. First run:

cabal repl \
  --build-depends "megaparsec ^>= 9.0" \
  --build-depends "lifted-base ^>= 0.2" \
  --build-depends "clock ^>= 0.8" \
  --build-depends "pqueue ^>= 1.4" \
  --build-depends "pretty-simple ^>= 4.0"

Then in the started GHCi REPL, run :load co-interpreter.hs. You should be able to use the functions in the interpreter now.↩︎

We are using:
- the containers and pqueue libraries for data structures,
- the clock library for getting system time,
- the megaparsec library for parsing,
- the mtl and lifted-base libraries for the Monad stack and operations, and
- the pretty-simple library for pretty printing data structures.
↩︎
I explore parsing in more depth by writing a JSON parser from scratch in Haskell in one of my previous posts.↩︎
I followed this (outdated) Megaparsec tutorial for writing a parser for an imperative language. This tutorial is also helpful in learning the intricacies of Megaparsec.↩︎
term is a Recursive descent parser. If we directly include call parsing as an alternative in the primary parser, we would run into the Left recursion problem, causing the parser to hang in infinite loop.↩︎
In the parsing parlance this is called Backtracking. The alternative is to do a Lookahead^@14.↩︎
An earlier version of this post used the symbol parser for parsing keywords instead of the reserved parser. That would cause the parser to fail to parse variable names like nullx that start with a keyword. Using the reserved parser fixes this issue because of the built-in backtracking using try.↩︎

Like, repost, or comment

Posted by Abhinav Sarkar

Like this post? Subscribe to get future posts by email.

Got suggestions, corrections, or thoughts? Post a comment!

9 comments

graham

August 7, 2021

I’m tweaking a very old bison grammar for some ancient in house stuff for my current client and realise the world has moved on. That’s beautifully clear and concise parser code you have there. Thanks for sharing.

HuwCampbell

May 12, 2021

This is great. Cheers.

noodlesteak

May 10, 2021

Hey really looking forward to follow your posts on my own. Just finished learning and implementing parser combinators in Java and Rust, and as a big fan of FP and Elixir I always wanted to make such a language, plus you do it in Haskell which I planned to get into as well ! You came right at the right moment for me thank you :)

Abhinav Sarkar

May 12, 2021

Thanks 😊. I hope this is useful.

funny_falcon

May 10, 2021

Historical remark: lightweight thread concurrency is a very old concept. Stackless Python - is very young for example. Simula had actors, Scheme were born in the study of CSP, even C++ in the first versions (when it were called “C with classes”) had coroutines.

Netherless, interesting article.

gilmi

May 10, 2021

A very interesting post. Thank you!

I have one question and one comment:

Question: I don’t understand how the ping pong example works. You mention that using coroutines a process needs to yield the ToC, but I don’t see any yielding in the program. What’s going on?

Comment: I think this use of try will make some of the reported errors cryptic and inaccurate. Because if for example if there is a parse error in the third argument in a function call, the entire function call expression is given up on and instead the parser will try the identifier route.

Instead, you want to do left factoring: parse the common prefixes together and then choose between the function call route (which starts with parens) and simply returning the identifier. try a <|> b considered harmful talks about this topic in length.

Abhinav Sarkar

May 10, 2021

Thanks a lot for the kind words . I learned a lot about this topic from your Streama screencasts.

This is the first part of a three-part series. I’ll go over how coroutines and channels work in the third part. But in short, sending and receiving over channels implicitly yields.

I agree. I wrote it this way to keep things simple. I didn’t want to add extra mental overhead for the readers.

gilmi

May 10, 2021

I see! I’ll keep an eye for the next posts

Keeping it simple is good. I would suggest mentioning the article above just in case someone runs into issues.

I think concurrency in PLT is very interesting and important and there aren’t a lot of non academic articles about this topic, so it’s great that you are writing about it! The article and examples are pretty clear to me overall. Good job!

And I’m glad to hear my streams were helpful

Nevets D'laboed

May 10, 2021

as always, abhinav’s articles get straight to the point and do a lot of work in a tiny amount of haskell code.

i really enjoy reviewing these.

Lightweight Concurrency

Introducing Co

The Co Interpreter

The Co AST

Expressions

Statements

Parsing

Parsing Expressions

Parsing Statements

Acknowledgements

Like, repost, or comment

Like this post? Subscribe to get future posts by email.

Got suggestions, corrections, or thoughts? Post a comment!

9 comments

graham

HuwCampbell

noodlesteak

Abhinav Sarkar

funny_falcon

gilmi

Abhinav Sarkar

gilmi

Nevets D'laboed

26 Mentions