Implementing Co, a Small Language With Coroutines #1: The Parser
- A twenty-five minute read
- 9 comments
- 3 🗣️ 19 ❤️ 4 🔁
Many major programming languages these days support some lightweight concurrency primitives. The most recent popular ones are Goroutines in Go, Coroutines in Kotlin and Async in Rust. Let’s explore some of these concepts in detail by implementing a programming language with support for coroutines and Go-style channels.
This is the first post in a series of posts:
- Implementing Co #1: The Parser
- Implementing Co #2: The Interpreter
- Implementing Co #3: Adding Coroutines
- Implementing Co #4: Adding Channels
- Implementing Co #5: Adding Sleep
Lightweight Concurrency
Lightweight concurrency has been a popular topic among programmers and programming language designers alike in recent times. Many languages created in the last decade have support for them either natively or using libraries. Some example are:
- Goroutines in Go,
- Coroutines in Kotlin,
- Async in Rust, and
- core.async in Clojure.
These examples differ in their implementation details but all of them enable programmers to run millions of tasks concurrently. This capability of being able to do multiple tasks at the same time is called Multitasking.
Multitasking can be of two types:
- Pre-emptive multitasking in which the tasks can be preempted so that other tasks can be run.
- Cooperative multitasking@1 in which the tasks voluntarily yield control to other tasks to be run.
Coroutines@2 are computations that support cooperative multitasking1. Unlike ordinary Subroutines that execute from start to end and do not hold any state between invocations, coroutines can exit in the middle by calling other coroutines and may later resume at the same point. They also hold state between invocations. They do so by yielding the control of the current running thread so that some other coroutine can be run on the same thread2.
Coroutine implementations often come with support for Channels for inter-coroutine communication. One coroutine can send a message over a channel, and another coroutine can receive the message from the same channel. Coroutines and channels together are an implementation of Communicating Sequential Processes (CSP)@5, a formal language for describing patterns of interaction in concurrent systems.
In this series of posts, we implement Co, a small dynamically typed imperative programming language with support for coroutines and channels. Haskell is our choice of language to write an interpreter for Co.
Introducing Co
Co has these basic features that are found in many programming languages:
- Dynamic and strong typing.
- Null, boolean, string and integer literals, and values.
- Addition, subtraction, multiplication and integer division arithmetic operations.
- String concatenation operation.
- Equality and inequality checks on booleans, strings and numbers.
- Less-than and greater-than comparison operations on numbers.
- Variable declarations, usage and assignments.
if
andwhile
statements.- Function declarations and calls, with support for recursion.
- First class functions and anonymous functions.
- Mutable closures.
It also has these special features:
yield
statement to yield the current thread of computation (ToC).spawn
statement to start a new ToC.- First class channels with operators to send and receive values over them.
sleep
function to sleep the current ToC for a given number of milliseconds.
Let’s see some example code in Co for illustration:
// Fibonacci numbers
// using a while loop
var a = 0;
var b = 1;
var j = 0;
while (j < 6) {
print(a);
var temp = a;
= b;
a = temp + b;
b = j + 1;
j
}
// Fibonacci numbers
// using recursive function call
function fib(n) {
if (n < 2) {
return n;
}return fib(n - 2)
+ fib(n - 1);
}
var i = 0;
while (i < 6) {
print(fib(i));
= i + 1;
i }
As you may notice, Co’s syntax is heavily inspired by JavaScript. The code example above computes and prints3 the first six Fibonacci numbers in two different ways, and demonstrates a number of features of Co, including variable declarations and assignments, while
loops, if
conditions, and function declarations and calls along with recursion.
We can save the code in a file and run it with the Co interpreter4 to print this (correct) output:
0
1
1
2
3
5
0
1
1
2
3
5
The next example shows the usage of coroutines in Co:
function printNums(start, end) {
var i = start;
while (i < end + 1) {
print(i);
yield;
= i + 1;
i
}
}
printNums(1, 4);
spawn printNums(11, 16);
Running this code with the interpreter prints this output:
11
1
12
2
13
3
14
4
15
16
The printNum
function prints numbers between the start
and end
arguments, but yields the ToC after each print. Notice how the prints are interleaved. This is because the two calls to the function printNums
run concurrently in two separate coroutines.
The next example show the usage of channels in Co:
var chan = newChannel();
function player(name) {
while (true) {
var n = <- chan;
if (n == "done") {
print(name + " done");
return;
}
print(name + " " + n);
if (n == 0) {
print(name + " done");
"done" -> chan;
return;
}- 1 -> chan;
n
}
}
player("ping");
spawn player("pong");
spawn 10 -> chan;
This is the popular Ping-pong benchmark for communication between ToCs. Here we use channels and coroutines for the same. Running this code prints this output:
ping 10
pong 9
ping 8
pong 7
ping 6
pong 5
ping 4
pong 3
ping 2
pong 1
ping 0
ping done
pong done
Lastly, here is Sleep sort in Co:
function sleepSort(a, b, c, d, e) {
function printNum(num) {
sleep(num);
print(num);
}printNum(a);
spawn printNum(b);
spawn printNum(c);
spawn printNum(d);
spawn printNum(e);
spawn
}
sleepSort(5, 4, 3, 2, 1);
Running this code prints this output:
1
2
3
4
5
We’ll revisit these examples later while building our interpreter, and understand them as well as the code that implements them.
The Co Interpreter
The Co interpreter works in two stages:
- Parsing: a parser converts Co source code to Abstract Syntax Tree (AST).
- Interpretation: a tree-walking interpreter walks the AST, executes the instructions and produces the output.
In this post, we implement the parser for Co. In the second part, we create a first cut of the interpreter that supports the basic features of Co. In the third and fourth parts, we extend the interpreter to add support for coroutines and channels.
The complete code for the parser is here. You can load it in GHCi using stack (by running stack co-interpreter.hs
), and follow along while reading this article5.
Let’s start with listing the extensions and imports needed6.
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE RecordWildCards #-}
module CoInterpreter where
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar.Lifted
import Control.Monad (unless, void, when)
import Control.Monad.Base (MonadBase, liftBase)
import Control.Monad.Combinators.Expr (Operator (..), makeExprParser)
import Control.Monad.Cont (ContT, MonadCont, callCC, runContT)
import Control.Monad.Except (ExceptT, MonadError (..), runExceptT)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Control.Monad.State.Strict (MonadState, StateT, evalStateT)
import qualified Control.Monad.State.Strict as State
import Data.Foldable (for_, traverse_)
import Data.IORef.Lifted
import qualified Data.Map.Strict as Map
import Data.Maybe (fromMaybe)
import qualified Data.PQueue.Prio.Min as PQ
import Data.Time.Clock.POSIX (getPOSIXTime)
import Data.Void (Void)
import System.Clock (Clock (Monotonic), fromNanoSecs, getTime, TimeSpec)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stderr)
import Text.Megaparsec hiding (runParser)
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
import Text.Pretty.Simple
CheckColorTty (..),
( OutputOptions (..),
defaultOutputOptionsNoColor,
pPrintOpt, )
Next, let’s take a look at the Co AST.
The Co AST
Since Co is an imperative programming language, it is naturally Statement oriented. A statement describes how an action is to be executed. A Co program is a list of top-level statements.
Statements have internal components called Expressions. Expressions evaluate to values at program run time. Let’s look at them first.
Expressions
We represent Co expressions as a Haskell Algebraic data type (ADT) with one constructor per expression type:
data Expr
= LNull
| LBool Bool
| LStr String
| LNum Integer
| Variable Identifier
| Binary BinOp Expr Expr
| Call Expr [Expr]
| Lambda [Identifier] [Stmt]
| Receive Expr
deriving (Show, Eq)
type Identifier = String
data BinOp =
Plus
| Minus
| Slash
| Star
| Equals
| NotEquals
| LessThan
| GreaterThan
deriving (Show, Eq)
LNull
-
The literal
null
. Evaluates to the null value. LBool Bool
-
The boolean literals,
true
andfalse
. Evaluate to their counterpart boolean values. LStr String
-
A string literal like
"towel"
. Evaluates to a string. LNum Integer
-
An integer literal like
1
or-5
. Evaluates to an integer. Variable Identifier
-
A variable named by an identifier like
a1
orsender
. Evaluates to the variable’s value at the point in the execution of code. AnIdentifier
is a string of alphanumeric characters, starting with an alpha character. Binary Op Expr Expr
-
A binary operation on two expressions. Example:
1 + 41
or2 == "2"
. Supported binary operations are defined by theBinOp
enum: addition/concatenation, subtraction, multiplication, integer division, equality and inequality checks, and less-than and greater-than comparisons. Call Expr [Expr]
-
Calls the function obtained by evaluating the callee expression, with the given argument expressions. Examples:
calcDistance(start, end)
orgetResolver(context)(template)
. Lambda [Identifier] [Stmt]
-
A function with the given parameter names and body statements. Example:
function (a, b) { return a*a + b*b; }
. Receive Expr
-
Receives a value from a channel. Examples:
// receive a value from the channel and prints it print(<- someChannel); // receive a value from the channel and assigns it var x = <- someChannel;
Next, we see how statements are represented in the AST.
Statements
We represent the Co statements as a Haskell ADT with one constructor per statement type:
data Stmt
= ExprStmt Expr
| VarStmt Identifier Expr
| AssignStmt Identifier Expr
| IfStmt Expr [Stmt]
| WhileStmt Expr [Stmt]
| FunctionStmt Identifier [Identifier] [Stmt]
| ReturnStmt (Maybe Expr)
| YieldStmt
| SpawnStmt Expr
| SendStmt Expr Identifier
deriving (Show, Eq)
type Program = [Stmt]
ExprStmt Expr
-
A statement with just an expression. The expression’s value is thrown away after executing the statement. Example:
1 + 2;
VarStmt Identifier Expr
-
Defines a new variable named by an identifier and sets it to an expression’s value. In Co, variables must be initialized when being defined. Example:
var a = 5;
AssignStmt Identifier Expr
-
Sets an already defined variable named by an identifier to an expression’s value. Example:
a = 55 + "hello";
IfStmt Expr [Stmt]
-
Executes a list of statements if the condition expression evaluates to a truthy value. In Co,
null
andfalse
values are non-truthy, and every other value is truthy. Also note that, there are noelse
branches forif
statements in Co. Example:if (a == 1) { print("hello"); }
WhileStmt Expr [Stmt]
-
Executes a list of statements repeatedly while the condition expression evaluates to a truthy value. Example:
var n = 0; while (n < 5) { print(n); = n + 1; n }
FunctionStmt Identifier [Identifier] [Stmt]
-
Defines a function with a name, a list of parameter names, and a list of body statements. Example:
function greet(greeting) { print(greeting + " world"); }
ReturnStmt (Maybe Expr)
-
Returns from a function, optionally returning an expression’s value. Example:
function square(x) { return x * x; }
YieldStmt
-
Suspends the currently executing ToC so that some other ToC may run. The current ToC resumes later from statement next to the
yield
statement. Example:function printNums(start, end) { var i = start; while (i < end + 1) { print(i); yield; = i + 1; i } }
SpawnStmt Expr
-
Starts a new ToC in which the given expression is evaluated, which runs concurrently with all other running ToCs. Example:
printNums(1, 4); spawn printNums(11, 16);
SendStmt Expr Identifier
-
Sends an expressions’s value to a channel named by an identifier. Example:
1 + square(3) -> someChannel;
The Co AST is minimal but it serves our purpose. Next, let’s figure out how to actually parse source code to AST.
Parsing
Parsing is the process of taking textual input data and converting it to a data structure—often a hierarchal structure like AST—while checking the input for correct syntax. There are many ways of writing parsers7, Parser Combinators@11 being one of them. Parser combinators libraries are popular in Haskell because of their ease of use and succinctness. We are going to use one such library, Megaparsec, to create a parser for Co.
Let’s start with writing some basic parsers for the Co syntax using the Megaparsec parsers.
type Parser = Parsec Void String
sc :: Parser ()
= L.space space1 lineCmnt blockCmnt
sc where
= L.skipLineComment "//"
lineCmnt = L.skipBlockComment "/*" "*/"
blockCmnt
lexeme :: Parser a -> Parser a
= L.lexeme sc
lexeme
symbol :: String -> Parser String
= L.symbol sc
symbol
reserved :: String -> Parser ()
= (lexeme . try) $ string w *> notFollowedBy alphaNumChar
reserved w
braces :: Parser a -> Parser a
parens,= between (symbol "(") (symbol ")")
parens = between (symbol "{") (symbol "}")
braces
stringLiteral :: Parser String
semi, identifier,= symbol ";"
semi = lexeme ((:) <$> letterChar <*> many alphaNumChar)
identifier = char '"' >> manyTill L.charLiteral (char '"') <* sc
stringLiteral
integer :: Parser Integer
= lexeme (L.signed sc L.decimal) integer
In the above code, Parser
is a concise type-alias for our parser type8.
The sc
parser is the space consumer parser. It dictates what’s ignored while parsing the sources code. For Co, we consider the Unicode space character and the control characters—tab, newline, carriage return, form feed, and vertical tab—as whitespaces. We use the space1
parser to configure that. sc
also lets us configure how to ignore line comments and block comments while parsing.
The lexeme
combinator is for parsing Lexemes while ignoring whitespaces and comments. It is implemented using the lexeme
combinator from Megaparsec.
The symbol
combinator is for parsing symbols, that is, operators like *
and ;
. It is implemented using the symbol
combinator.
The reserved
combinator is for parsing reserved keywords like if
and while
. It is implemented as a combination of lexeme
combinator and string
parser, while making sure that the reserved keyword is not a prefix of another word. It uses the try
combinator to backtrack if that happens.
The parens
and braces
combinators are for parsing code surrounded by parentheses (
and )
and braces {
and }
respectively.
The semi
parser matches a semicolon ;
. The identifier
parser is for parsing identifiers in Co. The stringLiteral
parser matches a string literal like "polyphonic ringtone"
. integer
is the parser for Co integers.
Let’s also write some functions to run the parsers and pretty-print the output in GHCi:
runParser :: Parser a -> String -> Either String a
= do
runParser parser code case parse parser "" code of
Left err -> Left $ errorBundlePretty err
Right prog -> Right prog
pPrint :: (MonadIO m, Show a) => a -> m ()
=
pPrint CheckColorTty $
pPrintOpt
defaultOutputOptionsNoColor= 2,
{ outputOptionsIndentAmount = True,
outputOptionsCompact = True
outputOptionsCompactParens }
That completes our basic setup for parsing. Let’s try them out in GHCi now:
> runParser identifier "num1 "
Right "num1"> runParser stringLiteral "\"val\" "
Right "val"> runParser integer "1 "
Right 1> runParser integer "-12 "
Right (-12)
They work as expected. Next, off to parsing Co expressions.
Parsing Expressions
Parsing Co expressions to AST requires us to handle the Associativity and Precedence of the operators. Fortunately, Megaparsec makes it easy with the makeExprParser
combinator. makeExprParser
takes a parser to parse the terms and a table of operators, and creates the expression parser for us.
- Terms are parts of expressions which cannot be broken down further into sub-expressions. Examples in Co are literals, variables, groupings and function calls.
- The table of operators is a list of
Operator
Parser Expr
lists ordered in descending precedence. All operators in one list have the same precedence but may have different associativity.
This is a lot to take in but looking at the code makes it clear. First, the operator table:
operators :: [[Operator Parser Expr]]
=
operators Prefix $ Receive <$ symbol "<-"],
[ [Slash $ symbol "/",
[ binary Star $ symbol "*"],
binary Plus $ symbol "+",
[ binary Minus $ try (symbol "-" <* notFollowedBy (char '>'))
binary
],LessThan $ symbol "<",
[ binary GreaterThan $ symbol ">"
binary
],Equals $ symbol "==",
[ binary NotEquals $ symbol "!="
binary
]
]where
= InfixL $ Binary op <$ symP binary op symP
The prefix operator <-
, for receiving values from channels, is of highest precedence and hence the first in the table. Next are the binary operators /
and *
for integer division and multiplication respectively. Following it, are the binary +
and -
operators which are of same precedence. After that come the comparison operators <
and >
. Finally, we have the lowest precedence operators, the equality and inequality checks ==
and !=
.
Each operator also contains the parser to parse the operator symbol in the source code. All of them are self-explanatory except the parser for the -
operator. The -
parser is slightly complicated because we want it to not parse ->
, the channel send operator, the first character of which is same as the symbol for the -
operator.
Moving on to the term parser next:
term :: Parser Expr
= primary >>= call
term where
=
call e "(")
( lookAhead (symbol >> symbol "("
>> Call e <$> sepBy expr (symbol ",") <* symbol ")"
>>= call )
<|> pure e
= LNull <$ reserved "null"
primary <|> LBool True <$ reserved "true"
<|> LBool False <$ reserved "false"
<|> LStr <$> stringLiteral
<|> LNum <$> integer
<|> Lambda
<$> (reserved "function" *> parens (sepBy identifier $ symbol ","))
<*> braces (many stmt)
<|> Variable <$> identifier
<|> parens expr
The term
parser uses the primary
parser to parse the primary terms, and the call
parser to parse the function calls. The call
parser is recursive because the callee itself can be a chain of more functions calls, for example x(1, 2)(y)("blah")()
. It starts with the output of the primary
parser, and then tries to parse a function call. If it succeeds, it calls itself again to parse the next function call in the chain. If it fails, it returns the output of the previous round9.
The primary
parser is a combination of smaller parsers—one for each type of primary terms in Co—combined together using the Alternative
instance of the parsers. It matches each parser one by one from the top until the match succeeds10. First, it tries to match for literals null
, true
, and false
, failing which it matches for string and integer literals. Then it matches for anonymous functions, variables, and expressions grouped in parentheses—in that order11.
That’s it! We finally write the expression parser:
expr :: Parser Expr
= makeExprParser term operators expr
Let’s play with it in GHCi:
> pPrint $ runParser expr "1 + a < 9 - <- chan"
Right
( Binary LessThan
( Binary Plus ( LNum 1 ) ( Variable "a" ) )
( Binary Minus ( LNum 9 ) ( Receive ( Variable "chan" ) ) ) )> pPrint $ runParser expr "funFun(null == \"ss\" + 12, true)"
Right
( Call "funFun"
[ Binary Equals LNull
( Binary Plus ( LStr "ss" ) ( LNum 12 ) )
, LBool True ] )> pPrint $ runParser expr "-99 - <- chan + funkyFun(a, false, \"hey\")"
Right
( Binary Plus
( Binary Minus ( LNum ( - 99 ) ) ( Receive ( Variable "chan" ) ) )
( Call "funkyFun" [ Variable "a", LBool False, LStr "hey" ] ) )> pPrint $ runParser expr "(function (x) { print(x + 1); })(99)"
Right
( Call
( Lambda [ "x" ]
[ ExprStmt
( Call
( Variable "print" )
[ Binary Plus ( Variable "x" ) ( LNum 1 ) ] ) ] ) [ LNum 99 ] )
Done! Onward, to parsing Co statements.
Parsing Statements
For parsing statements, we reuse the same trick we used for parsing expressions: combine smaller parsers for each statement type using Alternative
.
stmt :: Parser Stmt
=
stmt IfStmt <$> (reserved "if" *> parens expr) <*> braces (many stmt)
<|> WhileStmt <$> (reserved "while" *> parens expr) <*> braces (many stmt)
<|> VarStmt <$> (reserved "var" *> identifier) <*> (symbol "=" *> expr <* semi)
<|> YieldStmt <$ (reserved "yield" <* semi)
<|> SpawnStmt <$> (reserved "spawn" *> expr <* semi)
<|> ReturnStmt <$> (reserved "return" *> optional expr <* semi)
<|> FunctionStmt
<$> (try $ reserved "function" *> identifier)
<*> parens (sepBy identifier $ symbol ",")
<*> braces (many stmt)
<|> try (AssignStmt <$> identifier <*> (symbol "=" *> expr <* semi))
<|> try (SendStmt <$> expr <*> (symbol "->" *> identifier <* semi))
<|> ExprStmt <$> expr <* semi
Most of the statements start with keywords like if
, var
, function
, etc, so our parsers look for the keywords first using the reserved
parser. If none of the keyword-starting statements match then we try matching for assignments, channel send, and expression statements, in that order. The stmt
parser uses the expr
parser to parse expressions within statements. It also uses the combinators many
, optional
, try
and sepBy
from Megaparsec.
Finally, we put everything together to create the parser for Co:
program :: Parser Program
= sc *> many stmt <* eof program
The Co parser matches multiple top-level statements, starting with optional whitespace and ending with end-of-file.
It’s demo time. In GHCi, we parse a file and pretty-print the AST:
> readFile "coroutines.co" >>= pPrint . runParser program
Right
[ FunctionStmt "printNums"
[ "start", "end" ]
[ VarStmt "i"
( Variable "start" )
, WhileStmt
( Binary LessThan
( Variable "i" )
( Binary Plus ( Variable "end" ) ( LNum 1 ) ) )
[ ExprStmt
( Call "print" [ Variable "i" ] )
, YieldStmt
, AssignStmt "i"
( Binary Plus ( Variable "i" ) ( LNum 1 ) ) ] ]
, SpawnStmt
( Call "printNums" [ LNum 1, LNum 4 ] )
, ExprStmt ( Call "printNums" [ LNum 11, LNum 16 ] ) ]
I’ve gone ahead and created a side-by-side view of the source code and corresponding AST for all three input files, aligned neatly for your pleasure of comparison. If you so wish, feast your eyes on them, and check for yourself that everything works correctly.
Source code and AST for fib.co
|
|
Source code and AST for coroutines.co
|
|
Source code and AST for ping.co
|
|
That’s all for now. In about 200 lines of code, we have implemented the complete parser for Co. In the next part, we’ll implement a tree-walking interpreter for the Co AST that supports the basic features.
The full code for the parser can be seen here.
Acknowledgements
Many thanks to Steven Deobald for reviewing a draft of this article.
Coroutines are often conflated with Fibers. Some implementations of coroutines call themselves fibers instead. Quoting Wikipedia:
Fibers describe essentially the same concept as coroutines. The distinction, if there is any, is that coroutines are a language-level construct, a form of control flow, while fibers are a systems-level construct, viewed as threads that happen to not run in parallel. It is contentious which of the two concepts has priority: fibers may be viewed as an implementation of coroutines, or as a substrate on which to implement coroutines.
Green threads are another popular solution for lightweight concurrency. Unlike coroutines, green threads are preemtable. Unlike Threads, they are scheduled by the language runtime instead of the operating system.↩︎
print
is a built-in function in Co. It behaves same as theconsole.log
function in JavaScript, or theprint
function in Haskell.↩︎If you are a Cabal user instead, the process is a little longer. First run:
cabal repl \ --build-depends "megaparsec ^>= 9.0" \ --build-depends "lifted-base ^>= 0.2" \ --build-depends "clock ^>= 0.8" \ --build-depends "pqueue ^>= 1.4" \ --build-depends "pretty-simple ^>= 4.0"
Then in the started GHCi REPL, run
:load co-interpreter.hs
. You should be able to use the functions in the interpreter now.↩︎We are using:
- the containers and pqueue libraries for data structures,
- the clock library for getting system time,
- the megaparsec library for parsing,
- the mtl and lifted-base libraries for the Monad stack and operations, and
- the pretty-simple library for pretty printing data structures.
I explore parsing in more depth by writing a JSON parser from scratch in Haskell in one of my previous posts.↩︎
I followed this (outdated) Megaparsec tutorial for writing a parser for an imperative language. This tutorial is also helpful in learning the intricacies of Megaparsec.↩︎
term
is a Recursive descent parser. If we directly include call parsing as an alternative in theprimary
parser, we would run into the Left recursion problem, causing the parser to hang in infinite loop.↩︎In the parsing parlance this is called Backtracking. The alternative is to do a Lookahead@14.↩︎
An earlier version of this post used the
symbol
parser for parsing keywords instead of thereserved
parser. That would cause the parser to fail to parse variable names likenullx
that start with a keyword. Using thereserved
parser fixes this issue because of the built-in backtracking usingtry
.↩︎
Got suggestions, corrections, or thoughts? Post a comment!
9 comments
graham
HuwCampbell
noodlesteak
Abhinav Sarkar
funny_falcon
gilmi
Abhinav Sarkar
gilmi
Nevets D'laboed