Implementing Co, a Small Language With Coroutines #2: The Interpreter
- A thirty-two minute read
- 15 comments
- 4 🗣️ 14 ❤️ 7 🔁
In the previous post, we wrote the parser for Co, the small language we are building in this series of posts. The previous post was all about the syntax of Co. In this post we dive into the semantics of Co, and write an interpreter for its basic features.
This is the second post in a series of posts:
- Implementing Co #1: The Parser
- Implementing Co #2: The Interpreter
- Implementing Co #3: Adding Coroutines
- Implementing Co #4: Adding Channels
- Implementing Co #5: Adding Sleep
Previously, on …
Here’s a quick recap. The basic features of Co that we are aiming to implement in this post are:
- Dynamic and strong typing.
- Null, boolean, string and integer literals, and values.
- Addition, subtraction, multiplication and integer division arithmetic operations.
- String concatenation operation.
- Equality and inequality checks on booleans, strings and numbers.
- Less-than and greater-than comparison operations on numbers.
- Variable declarations, usage and assignments.
if
andwhile
statements.- Function declarations and calls, with support for recursion.
- First class functions and anonymous functions.
- Mutable closures.
Note that some parts of code snippets in this post have been faded away. These are the part which add support for coroutines and channels. You can safely ignore these parts for now. We’ll go over them in the next post.
We represent the Co Abstract Syntax Tree (AST) as a pair of Haskell Algebraic Data Types (ADTs), one for Expressions:
data Expr
= LNull
| LBool Bool
| LStr String
| LNum Integer
| Variable Identifier
| Binary BinOp Expr Expr
| Call Expr [Expr]
| Lambda [Identifier] [Stmt]
| Receive Expr
deriving (Show, Eq)
type Identifier = String
And another for Statements:
data Stmt
= ExprStmt Expr
| VarStmt Identifier Expr
| AssignStmt Identifier Expr
| IfStmt Expr [Stmt]
| WhileStmt Expr [Stmt]
| FunctionStmt Identifier [Identifier] [Stmt]
| ReturnStmt (Maybe Expr)
| YieldStmt
| SpawnStmt Expr
| SendStmt Expr Expr
deriving (Show, Eq)
type Program = [Stmt]
Also, program
is the parser for Co programs. To parse code, run the program
parser with the runParser
function like this:
> runParser program "var x = 1 + s;"
Right [VarStmt "x" (Binary Plus (LNum 1) (Variable "s"))]
Now, off to the new stuff.
Running a Program
There are many ways to run a program. If the program is written in Machine Code, you can run it directly on the matching CPU. But machine code is too low-level, and writing programs in it is very tedious and error-prone. Thus, programmers prefer to write code in high-level programming languages, and turn it into machine code to be able to run it@1. Here’s where different ways of running code come in:
- We can run the high-level code through a Compiler to turn it into machine code to be able to run it directly. Example: compiling C++ using GCC.
- We can run the code through a compiler which turns it into a relatively lower-level programming language code, and then run that lower-level program through another compiler to turn it into machine code. Example: compiling Haskell into LLVM IR using GHC, which can then be run through the LLVM toolchain to generate machine code.
- We can run the code through a Transpiler (also called Source-to-source compiler) to turn it into code in a programming language that is of similar level, and then run the resultant code with that language’s toolchain. Example: transpiling Purescript into JavaScript, and running it with node.js.
- We can compile the source code to Bytecode and run the bytecode on a Virtual Machine. Example: Java virtual machine running Java bytecode compiled from Clojure source code by the Clojure compiler.
- We can parse the code to an AST, and immediately execute the AST using an AST Interpreter. Example: PHP version 3, Bash. 1
- We can also mix-and-match parts of the above options to create hybrids, like Just-in-time compilation to machine code within a virtual machine.
For running Co programs, we will implement an AST-walking interpreter. The interpreter implemented in this post will support only the basic features of Co. In the next post, we’ll extend it to support coroutines and channels.
The complete code for the interpreter is here. You can load it in GHCi using stack (by running stack co-interpreter.hs
), and follow along while reading this article.
Runtime Values
An AST-walking interpreter takes an AST as its input, and recursively walks down the AST nodes, from top to bottom. While doing this, it evaluates expressions to runtime values, and executes the statements to do their effects.
The runtime values are things that can be passed around in the code during the program run time. Often called “first-class”, these values can be assigned to variables, passed as function arguments, and returned from functions. If Co were to support data structures like lists and maps, these values could be stored in them as well. The Value
ADT below represents these values:
data Value
= Null
| Boolean Bool
| Str String
| Num Integer
| Function Identifier [Identifier] [Stmt] Env
| BuiltinFunction Identifier Int ([Expr] -> Interpreter Value)
| Chan Channel
Other than the usual values like null
, booleans, strings, and numbers, we also have functions as first-class runtime values in Co. We have a constructor Function
for the functions that programmers define in their Co code, and another constructor BuiltinFunction
for built-in functions like print
2.
We also write instances to show and check equality for these values:
instance Show Value where
show = \case
Null -> "null"
Boolean b -> show b
Str s -> s
Num n -> show n
Function name _ _ _ -> "function " <> name
BuiltinFunction name _ _ -> "function " <> name
Chan Channel {} -> "Channel"
instance Eq Value where
Null == Null = True
Boolean b1 == Boolean b2 = b1 == b2
Str s1 == Str s2 = s1 == s2
Num n1 == Num n2 = n1 == n2
== _ = False _
Note that only null
, booleans, strings and numbers can be checked for equality in Co. Also, only values of same type can be equals. A string can never be equal to a number3.
So, how do we go about turning the expressions to values, and executing statements? Before learning that, we must take a detour into some theory of programming languages.
Readers familiar with the concepts of environments, scopes, closures and early returns can skip the next sections, and jump directly to the implementation.
Environment Model of Evaluation
Let’s say we have this little Co program to run:
var a = 2;
function twice(x) { return x + x; }
print(twice(a));
We need to evaluate twice(a)
to a value to print it. One way to do that is to substitute variables for their values, quite literally. twice
is a variable, value of which is a function. And a
is another variable, with value 2
. We can do repeated substitution to arrive at a resultant value like this:
print(twice(a));
=> print(twice(2));
=> print(2 + 2);
=> print(4);
This is called the Substitution model of evaluation@5. This works for the example we have above, and for a large set of programs4. However, it breaks down as soon as we add mutability to the mix:
var a = 2;
function incA() {
var b = a + 1;
return b;
}print(incA());
= 3;
a print(incA());
Running this with the Co interpreter results in the output:
3
4
We can’t use the substitution model here because we can’t consider variables like a
to be substitutable with single values anymore. Now, we must think of them more as places in which the values are stored. Also, the stored values may change over the lifetime of the program execution. We call this place where the variable values are stored, the Environment, and this understanding of program execution is called the Environment Model of Evaluation@7.
A pair of a variable’s name and its value at any particular time is called a Binding. An Environment is a collection of zero-or-more bindings. To fully understand environments, first we have to learn about scopes.
Scopes
Let’s consider the twice
function again:
function twice(x) { return x + x; }
print(twice(1));
print(twice(2));
Calling twice
with different arguments prints different results. The function seems to forget the value of its parameter x
after each call. This may feel very natural to programmers, but how does it really work? The answer is Scopes.
A scope is a region of the program lifetime during which a variable name-to-value binding is in effect. When the program execution enters a scope, the variables in that scope become defined and available to the executing code5. When the program execution exits the scope, the variables become undefined and inaccessible (also known as “going out of scope”).
Lexical scoping is a specific style of scoping where the structure of the program itself shows where a scope begins and ends@9. Like most modern languages, Co is lexically scoped. A function in Co starts a new scope which extends over the entire function body, and the scope ends when the function ends6. Functions are the only way of creating new scopes in Co7.
That’s how repeated invocation of functions don’t remember the values of their parameters across the calls. Every time a new call is started, a new scope is created with the parameter names bound to the value of the arguments of the call. And when the call returns, this new scope is destroyed.
Scopes can be enclosed within other scopes. In Co, this can be done by defining a function inside the body of another function. All programs have at least one scope, which is the program’s top-level scope, often called the global scope.
Scopes are intimately related to the environment. In fact, the structure of the environment is how scopes are implemented@12.
An environment can be thought of as a stack of frames, with one frame per enclosed scope@13. A frame contains zero-or-more bindings. The bindings in enclosed scopes (frames higher in the environment stack) hide the bindings (called shadowing) in enclosing scopes (frames lower in the environment stack). Program’s global scope is represented by the lowermost frame in the stack.
The above diagram shows the frames of the two calls to the twice
function. The scope of the twice
function is enclosed in the global scope. To find the value of a variable inside the function, the interpret first looks into the topmost frame that represents the scope of the twice
function. If the binding is not found, then the interpreter goes down the stack of frames, and looks into the frame for the global scope.
What happens when a function body tries to access variables not defined in the function’s scope? We get Closures.
Closures
If a function body refers to variables not defined in the function’s scope, such variables are called Free Variables@14. In lexically scoped languages, the value of a free variable is determined from the scope in which the function is defined. A function along with the references to all its free variables, is called a Closure8.
Closures are prevalent in programming languages with first-class functions. Co—with its support for first-class functions—also supports closures. Closures in Co are mutable, meaning the values of the free variables of a function can change over time, and the changes are reflected in the behavior of the function9.
We already saw an example of closures earlier:
var a = 2;
function incA() {
var b = a + 1;
return b;
}print(incA());
= 3;
a print(incA());
This is how the frames exist over time for the two invocations of the incA
function:
a
is a free variable of the function incA
Here, a
is a free variable of the function incA
. Its value is not present in the scope of incA
, but is obtained from the global scope. When its value in the global scope changes later, the value returned by incA
changes as well. In other words, incA
and a
together form a closure.
The following example demonstrates a closure with a mutable free variable and enclosed scopes:
function makeCounter(name) {
var count = 0;
return function () {
= count + 1;
count print(name + " = " + count);
;
}
}
var countA = makeCounter("a");
var countB = makeCounter("b");
countA();
countA();
countB();
countA();
Here, both name
and count
are free variables referred in the returned function. While name
is only read, count
is changed in the body of the function.
Running the above code prints:
a = 1
a = 2
b = 1
a = 3
Note that the two functions countA
and countB
refer to two different instances of the count
variable, and are not affected by each other. In other words, countA
and countB
are two different closures for the same function.
Now for one last thing to learn about before we jump to the implementation: early returns.
Early Returns
Statement oriented programming languages often allow returning from a function before the entire function is done executing. This is called an Early return. We saw an example of this in the fibonacci function in the previous post:
function fib(n) {
if (n < 2) {
return n;
}return fib(n - 2)
+ fib(n - 1);
}
In the above code, when the input n
is less than 2, the code returns early from the function at the line 3.
Expression oriented programming languages, like Haskell, have no early returns. Every function is an expression in Haskell, and has to be evaluated entirely10 to get back a value. Since our AST-walking interpreter itself is written in Haskell, we need to figure out how to support early returns in the Co code being interpreted. The interpreter should be able to stop evaluating at an AST node representing a return
statement, and jump to the node representing the function’s caller.
One way to implement this is Exceptions. Exceptions let us abort the execution of code at any point of execution, and resume from some other point in the lower in the function call stack. Although Haskell supports exceptions as we know them from languages like Java and Python, it also supports exceptions as values using the Error monad. That’s what we will leverage for implementing early returns in our interpreter.
Finally, we are really to start implementing the interpreter.
The Interpreter
The interpreter is implemented as a Haskell newtype
over a stack of monad using the monad transformers and typeclasses from the mtl library:
newtype Interpreter a = Interpreter
runInterpreter ::
{ExceptT Exception
(ContT
(Either Exception ())
StateT InterpreterState IO))
(
a
}deriving
Functor,
( Applicative,
Monad,
MonadIO,
MonadBase IO,
MonadState InterpreterState,
MonadError Exception,
MonadCont
)
From bottom to top, the monad stack is comprised of:
- the
IO
monad to be able to print to the console, - the
State
monad transformer to track the state of the interpreter, and - the
Except
monad transformer to propagate exceptions while interpreting the code.
We model the environment as Map
of variable names to IORef
s of values:
type Env = Map.Map Identifier (IORef Value)
The immutable nature of Map
and the mutable nature of IORef
allow us to correctly model scopes, frames and closures in Co, as we see in the later sections of this post.
The interpreter state contains the environment used for interpretation. The state changes as variables come in and go out of scopes.
data InterpreterState = InterpreterState
isEnv :: Env,
{ isCoroutines :: Queue (Coroutine ())
}
initInterpreterState :: IO InterpreterState
= InterpreterState <$> builtinEnv <*> newQueue initInterpreterState
Initial interpreter state contains the built-in environment with bindings for the built-in functions like print
. In particular, print
is implemented by the executePrint
function, which we see in a later section. Note that, arity of built-in functions is also encapsulated in them.
builtinEnv :: IO Env
= Map.fromList <$> traverse (traverse newIORef) [
builtinEnv "print", BuiltinFunction "print" 1 executePrint)
( , ("newChannel",
BuiltinFunction "newChannel" 0 $ fmap Chan . const (newChannel 0))
, ("newBufferedChannel",
BuiltinFunction "newBufferedChannel" 1 executeNewBufferedChannel)
, ("sleep", BuiltinFunction "sleep" 1 executeSleep)
, ("getCurrentMillis",
BuiltinFunction "getCurrentMillis" 0 executeGetCurrentMillis)
]
When trying to interpret wrong code like 1 + true
, the interpreter throws runtime errors. We roll these errors along with early returns into an ADT for exceptions:
data Exception
= Return Value
| RuntimeError String
| CoroutineQueueEmpty
That’s it for defining the types for the interpreter. Next, we implement the functions to interpret Co programs, starting with functions to work with environments.
Manipulating Environments
In Co, variables must be initialized when being defined. Additionally, only the already defined variables can be referenced or assigned.
To define a new variable, we create a new IORef
with the variable’s value, insert it in the current environment map with the variable name as the key, and replace the interpreter state with the new environment map.
defineVar :: Identifier -> Value -> Interpreter ()
= do
defineVar name value <- State.gets isEnv
env if Map.member name env
then throw $ "Variable already defined: " <> name
else do
<- newIORef value
valueRef $ Map.insert name valueRef env
setEnv
setEnv :: Env -> Interpreter ()
= State.modify' $ \is -> is {isEnv = env} setEnv env
Note that trying to redefine an already defined variable throws a runtime error.
We also create a helper function setEnv
that we reuse in later sections.
To lookup and assign a variable, we get the current environment, lookup the IORef
in the map by the variable’s name, and then read the IORef
for lookup, or write the new value to it for assignment.
lookupVar :: Identifier -> Interpreter Value
=
lookupVar name >>= findValueRef name >>= readIORef
State.gets isEnv
assignVar :: Identifier -> Value -> Interpreter ()
=
assignVar name value >>= findValueRef name >>= flip writeIORef value State.gets isEnv
We use the helper function findValueRef
to lookup a variable name in the environment map. It throws a runtime error if the variable is not already defined.
findValueRef :: Identifier -> Env -> Interpreter (IORef Value)
=
findValueRef name env case Map.lookup name env of
Just ref -> return ref
Nothing -> throw $ "Unknown variable: " <> name
throw :: String -> Interpreter a
= throwError . RuntimeError throw
These functions are enough for us to implement the evaluation of expressions and execution of statements.
Evaluating Expressions
Co expressions are represented by the Expr
ADT. The evaluate
function below shows how they are evaluated to runtime values.
evaluate :: Expr -> Interpreter Value
= \case
evaluate LNull -> pure Null
LBool bool -> pure $ Boolean bool
LStr str -> pure $ Str str
LNum num -> pure $ Num num
Variable v -> lookupVar v
Lambda params body -> Function "<lambda>" params body <$> State.gets isEnv
@Binary {} -> evaluateBinaryOp binary
binary@Call {} -> evaluateFuncCall call
call Receive expr -> evaluate expr >>= \case
Chan channel -> channelReceive channel
val -> throw $ "Cannot receive from a non-channel: " <> show val
Literals null
, booleans, strings, and numbers evaluate to themselves. Variables are looked up from the environment using the lookupVar
function we wrote earlier. Anonymous functions are evaluated to function values that capture the current environment from the interpreter state. We learn more about function definitions and calls in the next section. Binary operations and function call expressions are handled by helper functions explained below.
evaluateBinaryOp :: Expr -> Interpreter Value
~(Binary op leftE rightE) = do
evaluateBinaryOp <- evaluate leftE
left <- evaluate rightE
right let errMsg msg = msg <> ": " <> show left <> " and " <> show right
case (op, left, right) of
Plus, Num n1, Num n2) -> pure $ Num $ n1 + n2
(Plus, Str s1, Str s2) -> pure $ Str $ s1 <> s2
(Plus, Str s1, _) -> pure $ Str $ s1 <> show right
(Plus, _, Str s2) -> pure $ Str $ show left <> s2
(Plus, _, _) -> throw $ errMsg "Cannot add or append"
(
Minus, Num n1, Num n2) -> pure $ Num $ n1 - n2
(Minus, _, _) -> throw $ errMsg "Cannot subtract non-numbers"
(
Slash, Num n1, Num n2) -> pure $ Num $ n1 `div` n2
(Slash, _, _) -> throw $ errMsg "Cannot divide non-numbers"
(
Star, Num n1, Num n2) -> pure $ Num $ n1 * n2
(Star, _, _) -> throw $ errMsg "Cannot multiply non-numbers"
(
LessThan, Num n1, Num n2) -> pure $ Boolean $ n1 < n2
(LessThan, _, _) -> throw $ errMsg "Cannot compare non-numbers"
(GreaterThan, Num n1, Num n2) -> pure $ Boolean $ n1 > n2
(GreaterThan, _, _) -> throw $ errMsg "Cannot compare non-numbers"
(
Equals, _, _) -> pure $ Boolean $ left == right
(NotEquals, _, _) -> pure $ Boolean $ left /= right (
To evaluate a binary operation, first we recursively evaluate its left and right operands by calling evaluate
on them. Then, depending on the operation and types of the operands, we do different things.
- The
+
operation can be used to either add two numbers, or to concat two operands when one or both of them are strings. In all other cases, it throws runtime errors. - The
-
,/
,*
,>
, and<
operations can be invoked only on numbers. Other cases throw runtime errors. - The
==
and!=
operations run corresponding Haskell operations on their operands.
That’s all for evaluating binary operations. Next, let’s look at how to execute statements. We come back to evaluating function calls after that.
Executing Statements
Co statements are represented by the Stmt
ADT. The execute
function below uses a case expression to execute the various types of statements in different ways:
execute :: Stmt -> Interpreter ()
= \case
execute ExprStmt expr -> void $ evaluate expr
VarStmt name expr -> evaluate expr >>= defineVar name
AssignStmt name expr -> evaluate expr >>= assignVar name
IfStmt expr body -> do
<- evaluate expr
cond $
when (isTruthy cond)
traverse_ execute body@(WhileStmt expr body) -> do
while<- evaluate expr
cond $ do
when (isTruthy cond)
traverse_ execute body
execute whileReturnStmt mExpr -> do
<- traverse evaluate mExpr
mRet . Return . fromMaybe Null $ mRet
throwError FunctionStmt name params body -> do
<- State.gets isEnv
env $ Function name params body env
defineVar name YieldStmt -> yield
SpawnStmt expr -> spawn expr
SendStmt expr chan -> evaluate chan >>= \case
Chan channel -> do
val <- evaluate expr
channelSend val channel
v -> throw $ "Cannot send to a non-channel: " <> show v
where
= \case
isTruthy Null -> False
Boolean b -> b
-> True _
Expressions in expression statements are evaluated by calling evaluate
on them, and the resultant values are discarded.
For variable definition and assignment statements, first we evaluate the value expressions, and then define or assign variables with the given variable names and the resultant values.
For if
statements, first we evaluate their conditions, and if conditions yield truthy11 values, we recursively execute the statement bodies. while
statements are executed in a similar fashion, except we recursively execute the while
statements again after executing their bodies.
For return
statements, we evaluate their optional return value expressions, and then throw the resultant values as exceptions wrapped with the Return
constructor.
Execution of function statements is more interesting. First thing that we do is to capture the current environment from the interpreter state. Then we define a new variable12 with the function’s name and a runtime function value that encapsulates the function’s name, parameter names, and body statements, as well as, the captured environment. This is how closures record the values of functions’ free variables from their definition contexts.
In the next section, we see how the captured environments and returns as exceptions are used to evaluate function calls.
Evaluating Function Calls
The capability of defining and calling functions is the cornerstone of abstraction in programming languages. In Co, functions are first-class, and are also the means of implementing scopes and closures. Named functions support recursion13 as well. Hence, this section is the most important and involved one.
We start by evaluating the callee expression of the function call.
evaluateFuncCall :: Expr -> Interpreter Value
~(Call callee argEs) =
evaluateFuncCall >>= \case
evaluate callee BuiltinFunction name arity func -> do
checkArgCount name argEs arity
func argEs@Function {} -> evaluateFuncCall' func argEs
func-> throw $ "Cannot call a non-function: " <> show callee <> " is " <> show val
val
checkArgCount :: Identifier -> [Expr] -> Int -> Interpreter ()
=
checkArgCount funcName argEs arity length argEs /= arity) $
when ($ funcName <> " call expected " <> show arity
throw <> " argument(s) but received " <> show (length argEs)
executePrint :: [Expr] -> Interpreter Value
=
executePrint argEs head argEs) >>= liftIO . print >> return Null evaluate (
If the resultant value is not a function, we throw a runtime error.
If we get a built-in function, we check that the count of arguments is same as the arity of the function by invoking checkArgCount
, failing which we throw a runtime error. Then, we invoke the corresponding implementation function. For print
, it is the executePrint
function, in which we evaluate the argument and print it using Haskell’s print
function.
If we get a user-defined function, we evaluate the function call with the helper function evaluateFuncCall'
. But before diving into it, let’s take a look at how the world looks from inside a function.
function makeGreeter(greeting) {
function greeter(name) {
var say = greeting + " " + name;
print(say);
}return greeter;
}
var hello = makeGreeter("hello");
var namaste = makeGreeter("namaste");
hello("Arthur");
namaste("Ford");
In the above Co code, the function greeter
has a free variable greeting
, a bound parameter name
, and a local variable say
. Upon executing the code with the interpreter, we get the following output:
hello Arthur
namaste Ford
The output makes sense when we understand the variables hello
and namaste
are closures over the function greeter
. The environment seen from inside greeter
when it is being executed is a mix of the scope (and hence, the environment) it is defined in, and the scope it is called in.
More specifically, the free variables come from the definition scope, and the parameters come from the caller scope. Local variables can be derived from any combinations of free variables and parameters. With this understanding, let’s see how we evaluate function calls:
evaluateFuncCall' :: Value -> [Expr] -> Interpreter Value
evaluateFuncCall'~func@(Function funcName params body funcDefEnv) argEs = do
length params)
checkArgCount funcName argEs (<- State.gets isEnv
funcCallEnv
setupFuncEnv<- executeBody funcCallEnv
retVal
setEnv funcCallEnvreturn retVal
where
= do
setupFuncEnv <- traverse evaluate argEs
args <- overrideVar funcDefEnv funcName func
env <- foldM (uncurry . overrideVar) env $ zip params args
env'
setEnv env'
=
overrideVar env name value <$> newIORef value <*> pure env
Map.insert name
=
executeBody funcCallEnv >> return Null) `catchError` \case
(traverse_ execute body Return val -> return val
-> setEnv funcCallEnv >> throwError err err
Let’s go over the above code, step by step:
evaluateFuncCall'
is called with the function to evaluate. We get access to the function’s name, its parameter names, body statements, and the environment it is defined in. We also get the argument expressions for the function call. (Line 2–3)- First, we check that the count of arguments is same as the count of the function parameter by invoking
checkArgCount
, failing which we throw a runtime error. (Line 4) - Then, we capture the current environment from the interpreter state. This is the function’s caller’s environment. (Line 5)
- Next, we set up the environment in which the function will be executed (line 6). In
setupFuncEnv
:- We evaluate the argument expressions in the current (caller’s) environment14. (Line 12)
- We bind the callee function itself to its name in its own environment. This lets our function to recursively call itself. (Line 13)
- We bind the argument values to their parameter names in the function’s environment. This lets the function body access the arguments being called with. (Line 14)
- We set the current environment in the interpreter state to the functions’s environment. (Line 15)
- With the function environment set up, we execute the function body in
executeBody
(line 7):- We execute each statement in the body, and return
null
in case there was no explicitreturn
in the function. (Line 21) - If the body contains a
return
statement, or if its execution throws a runtime error, we handle the exception in thecatchError
case statement.- For
return
, we pass along the return value. (Line 22) - For a runtime error, first we set the current environment back to the caller’s environment that we captured in step 3, and then we throw the error. The error is eventually handled in the
interpret
function described in the next section. (Line 23)
- For
- We capture the value returned from executing the body. (Line 7)
- We execute each statement in the body, and return
- We set the current environment back to the caller’s environment that we captured in step 3. (Line 8)
- We return the captured return value from
evaluateFuncCall'
. The function call is now complete. (Line 9)
Curious readers may wonder, why do we need to use State monad, Map
s, and IORef
s together, when all of them do similar work of storing and mutating variables? Because, together they let us implement function calls, scopes and closures, as described below:
- State monad lets us swap the current environment for a function’s definition environment when a function call is made, and to restore the calling environment after the call is complete.
- Immutable maps are perfect for implementing scopes. Adding variables in an immutable map returns a modified map without changing the original map. This lets us shadow variables defined in outer scopes when entering inner scopes, while also being able to easily restore the shadowed variables by just restoring the original map after the inner scopes end15. There is no need to use a stack of mutable maps, which is how environments are generally implemented in interpreters which do not use immutable maps.
- Lastly, putting
IORef
s as values of immutable maps lets us implement mutable closures. All closures of same function share the same references to theIORef
s. This allows variable mutations made from one closure to be visible to all others. If we had used just immutable maps, changes made to variable values would not propagate between closures because of immutability.
So that’s how function calls—the most crucial part of the interpreter—work. That completes the guts of our interpreter for the basic features of Co. In the next and last section, we put everything together.
Interpreting a Program
We are down to the last step. We interpret a program returned from the parser written in the previous post to run it.
interpret :: Program -> IO (Either String ())
= do
interpret program <- initInterpreterState
state <- flip evalStateT state
retVal . flip runContT return
. runExceptT
. runInterpreter
$ (traverse_ execute program >> awaitTermination)
case retVal of
Left (RuntimeError err) -> return $ Left err
Left (Return _) -> return $ Left "Cannot return from outside functions"
Left CoroutineQueueEmpty -> return $ Right ()
Right _ -> return $ Right ()
We run the list of statements in the program by running the execute
function on them. Then we run the monad transformer stack, layer by layer, to get the return value. Finally, we case match on the return value to catch errors, and we are done.
We package the parser and the interpreter together to create the runFile
function that takes a file path, reads and parses the file, and then interprets the AST:
runFile :: FilePath -> IO ()
= do
runFile file <- readFile file
code case runParser program code of
Left err -> hPutStrLn stderr err
Right program -> interpret program >>= \case
Left err -> hPutStrLn stderr $ "ERROR: " <> err
-> return () _
Finally, we can run the interpreter on the Co files:
> runFile "fib.co"
0
1
1
2
3
5
0
1
1
2
3 5
That’s all for now. We implemented the interpreter for the basic features for Co, and learned about how function calls, scopes and closures work. In the next part, we’ll extend our interpreter to add support for coroutines and channels in Co.
The full code for the interpreter can be seen here.
It’s hard to find examples of real-world programming languages that are run with AST interpreters. This is because AST interpreters are too slow for real-world usage. However, they are the easiest to understand and implement, and hence are widely using in teaching programming languages theory.↩︎
Since the user-defined and built-in functions are first-class, they can be assigned to variables, and passed as arguments to other functions. Thus, Co supports higher-order functions as well.↩︎
This is called Strong typing in programming languages parlance. JavaScript, on the other hand, is a weakly typed language. In JavaScript,
1 == '1'
evaluates totrue
, whereas in Co, it evaluates tofalse
.↩︎The property of being able to substitute expressions for their corresponding values without changing the meaning of the program is called Referential transparency@6. Pure functions—like
twice
here—that do not have any side-effects are referentially transparent.↩︎I’m being a little hand-wavy here because most programmers have at least an intuitive understanding of scopes. Read literature for accurate details.↩︎
This is in contrast to Dynamic scoping where the a variable’s scope is essentially global, and is defined by function’s execution context instead of definition context, as in lexical scoping.↩︎
Blocks are another widely used structure to support lexical scoping. Co doesn’t have blocks in the interest of simplicity of implementation.↩︎
The function is said to close its free variables over its closure. Hence, the name Closure.↩︎
Some programming languages like Java support a limited version of closures, which requires values of the free variables of functions to not change over time.↩︎
Well, not entirely, because Haskell is a lazily evaluated language.↩︎
In Co, only
null
andfalse
evaluate to false. All other values evaluate to true. This is implemented by theisTruthy
function.↩︎Functions are just variables in Co. That is to say, functions definitions and variable definitions share the same namespace. This is how it works in many programming languages like JavaScript and Python. But some languages like Common Lisp have separate namespaces for functions and variables.↩︎
Co does not support mutual recursion though. This is because a function in Co only sees the bindings done before its own definition. This can be fixed by either adding a special syntax for mutually recursive functions, or by hoisting all the bindings in a scope to the top of the scope, like how JavaScript does.
Anonymous functions do not support recursion at all, because they do not have names to refer to themselves in their bodies.↩︎
Evaluating function arguments before the function body is called the Strict evaluation strategy. Most of the modern programming languages work this way, for example, Java, Python, JavaScript, Ruby etc. This is in contrast to Non-strict evaluation in programming languages like Haskell, where the arguments to functions are evaluated only when their values are needed in the function bodies@21.↩︎
This is what the
overrideVar
function does in the code above.↩︎
Got suggestions, corrections, or thoughts? Post a comment!
15 comments
konnik
samuelUniversal
ByteEater
dpwiz
Abhinav Sarkar
Oldpug-
Abhinav Sarkar
Oldpug-
Abhinav Sarkar
Oldpug-
Ted
mullsork
Abhinav Sarkar
artemonster
gilmi