Solving Advent of Code ’23 “Aplenty” by Compiling
- A eighteen minute read
- 14 comments
- 2 🗣️ 11 ❤️ 7 🔁
Every year I try to solve some problems from the Advent of Code (AoC) competition in a not straightforward way. Let’s solve the part one of the day 19 problem Aplenty by compiling the problem input to an executable file.
The Problem
What the problem presents as input is essentially a program. Here is the example input:
Each line in the first section of the input is a code block. The bodies of the blocks have statements of these types:
- Accept (
A
) or Reject (R
) that terminate the program. - Jumps to other blocks by their names, for example:
rfg
as the last statement of thepx
block in the first line. - Conditional statements that have a condition and what to do if the condition is true, which can be only Accept/Reject or a jump to another block.
The problem calls the statements “rules”, the blocks “workflows”, and the program “system”.
All blocks of the program operates on a set of four values: x
, m
, a
, and s
. The problem calls them “ratings”, and each set of ratings is for/forms a “part”. The second section of the input specifies a bunch of these parts to run the system against.
This seems to map very well to a C program, with Accept
and Reject
returning true
and false
respectively, and jumps accomplished using goto
s. So that’s what we’ll do: we’ll compile the problem input to a C program, then compile that to an executable, and run it to get the solution to the problem.
And of course, we’ll do all this in Haskell. First some imports:
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE StrictData #-}
module Main where
import qualified Data.Array as Array
import Data.Char (digitToInt, isAlpha, isDigit)
import Data.Foldable (foldl', foldr')
import Data.Function (fix)
import Data.Functor (($>))
import qualified Data.Graph as Graph
import Data.List (intercalate, (\\))
import qualified Data.Map.Strict as Map
import System.Environment (getArgs)
import qualified Text.ParserCombinators.ReadP as P
import Prelude hiding (GT, LT)
The Parser
First, we parse the input program to Haskell data types. We use the ReadP parser library built into the Haskell standard library.
data Part = Part
partX :: Int,
{ partM :: Int,
partA :: Int,
partS :: Int
deriving (Show)
}
data Rating = X | M | A | S deriving (Show, Eq)
emptyPart :: Part
= Part 0 0 0 0
emptyPart
addRating :: Part -> (Rating, Int) -> Part
= case r of
addRating p (r, v) X -> p {partX = v}
M -> p {partM = v}
A -> p {partA = v}
S -> p {partS = v}
partParser :: P.ReadP Part
=
partParser
foldl' addRating emptyPart<$> P.between (P.char '{') (P.char '}')
`P.sepBy1` P.char ',')
(partRatingParser
partRatingParser :: P.ReadP (Rating, Int)
=
partRatingParser <$> ratingParser <*> (P.char '=' *> intParser)
(,)
ratingParser :: P.ReadP Rating
=
ratingParser >>= \case
P.get 'x' -> pure X
'm' -> pure M
'a' -> pure A
's' -> pure S
-> P.pfail
_
intParser :: P.ReadP Int
=
intParser -> n * 10 + d) 0 <$> P.many1 digitParser
foldl' (\n d
digitParser :: P.ReadP Int
= digitToInt <$> P.satisfy isDigit
digitParser
parse :: (Show a) => P.ReadP a -> String -> Either String a
= case P.readP_to_S (parser <* P.eof) text of
parse parser text "")] -> Right res
[(res, -> Left $ "Leftover input: " <> s
[(_, s)] -> Left $ "Unexpected output: " <> show out out
Part
is a Haskell data type representing parts, and Rating
is an enum for, well, ratings1.
Following that are parsers for parts and ratings, written in Applicative and Monadic styles using the basic parsers and combinators provided by the ReadP library.
Finally, we have the parse
function to run a parser on an input. We can try parsing parts in GHCi:
> parse partParser "{x=2127,m=1623,a=2188,s=1013}"
Right (Part {partX = 2127, partM = 1623, partA = 2188, partS = 1013})
Next, we represent and parse the program, I mean, the system:
newtype System =
System (Map.Map WorkflowName Workflow)
deriving (Show, Eq)
data Workflow = Workflow
wName :: WorkflowName,
{ wRules :: [Rule]
deriving (Show, Eq)
}
type WorkflowName = String
data Rule
= AtomicRule AtomicRule
| If Condition AtomicRule
deriving (Show, Eq)
data AtomicRule
= Jump WorkflowName
| Accept
| Reject
deriving (Show, Eq, Ord)
data Condition
= Comparison Rating CmpOp Int
deriving (Show, Eq)
data CmpOp = LT | GT deriving (Show, Eq)
A System
is a map of workflows by their names. A Workflow
has a name and a list of rules. A Rule
is either an AtomicRule
, or an If
rule. An AtomicRule
is either a Jump
to another workflow by name, or an Accept
or Reject
rule. The Condition
of an If
rule is a less that (LT
) or a greater than (GT
) Comparison
of some Rating
of an input part with an integer value.
Now, it’s time to parse the system:
systemParser :: P.ReadP System
=
systemParser System
. foldl' (\m wf -> Map.insert (wName wf) wf m) Map.empty
<$> workflowParser `P.endBy1` P.char '\n'
workflowParser :: P.ReadP Workflow
=
workflowParser Workflow
<$> P.many1 (P.satisfy isAlpha)
<*> P.between (P.char '{') (P.char '}')
`P.sepBy1` P.char ',')
(ruleParser
ruleParser :: P.ReadP Rule
=
ruleParser AtomicRule <$> atomicRuleParser) P.<++ ifRuleParser
(
ifRuleParser :: P.ReadP Rule
=
ifRuleParser If
<$> (Comparison <$> ratingParser <*> cmpOpParser <*> intParser)
<*> (P.char ':' *> atomicRuleParser)
atomicRuleParser :: P.ReadP AtomicRule
= do
atomicRuleParser : _ <- P.look
c case c of
'A' -> P.char 'A' $> Accept
'R' -> P.char 'R' $> Reject
-> (Jump .) . (:) <$> P.char c <*> P.many1 (P.satisfy isAlpha)
_
cmpOpParser :: P.ReadP CmpOp
= P.choice [P.char '<' $> LT, P.char '>' $> GT] cmpOpParser
Parsing is straightforward as there are no recursive data types or complicated precedence or associativity rules here. We can exercise it in GHCi (output formatted for clarity):
> parse workflowParser "px{a<2006:qkq,m>2090:A,rfg}"
Right (
Workflow {
wName = "px",
wRules = [
If (Comparison A LT 2006) (Jump "qkq"),
If (Comparison M GT 2090) Accept,
AtomicRule (Jump "rfg")
]
} )
Excellent! We can now combine the part parser and the system parser to parse the problem input:
data Input = Input System [Part] deriving (Show)
inputParser :: P.ReadP Input
=
inputParser Input
<$> systemParser
<*> (P.char '\n' *> partParser `P.endBy1` P.char '\n')
Before moving on to translating the system to C, let’s write an interpreter so that we can compare the output of our final C program against it for validation.
The Interpreter
Each system has a workflow named “in”, where the execution of the system starts. Running the system results in True
if the run ends with an Accept
rule, or in False
if the run ends with a Reject
rule. With this in mind, let’s cook up the interpreter:
runSystem :: System -> Part -> Bool
System system) part = runRule $ Jump "in"
runSystem (where
= \case
runRule Accept -> True
Reject -> False
Jump wfName -> jump wfName
= case Map.lookup wfName system of
jump wfName Just workflow -> runRules $ wRules workflow
Nothing ->
error $ "Workflow not found in system: " <> wfName
= \case
runRules : rest) -> case rule of
(rule AtomicRule aRule -> runRule aRule
If cond aRule ->
if evalCond cond
then runRule aRule
else runRules rest
-> error "Workflow ended without accept/reject"
_
= \case
evalCond Comparison r LT value -> rating r < value
Comparison r GT value-> rating r > value
= \case
rating X -> partX part
M -> partM part
A -> partA part
S -> partS part
The interpreter starts by running the rule to jump to the “in” workflow. Running a rule returns True
or False
for Accept
or Reject
rules respectively, or jumps to a workflow for Jump
rules. Jumping to a workflow looks it up in the system’s map of workflows, and sequentially runs each of its rules.
An AtomicRule
is run as previously mentioned. An If
rule evaluates its condition, and either runs the consequent rule if the condition is true, or moves on to running the rest of the rules in the workflow.
That’s it for the interpreter. We can run it on the example input:
> inputText <- readFile "input.txt"
> Right (Input system parts) = parse inputParser inputText
> runSystem system (parts !! 0)
True> runSystem system (parts !! 1)
False
The AoC problem requires us to return the sum total of the ratings of the parts that are accepted by the system:
solve :: Input -> Int
Input system parts) =
solve (sum
. map (\(Part x m a s) -> x + m + a + s)
. filter (runSystem system)
$ parts
Let’s run it for the example input:
> Right input <- parse inputParser <$> readFile "exinput.txt"
> solve input
19114
It returns the correct answer! Next up, we generate some C code.
The Control-flow Graph
But first, a quick digression to graphs. A Control-flow graph or CFG, is a graph of all possible paths that can be taken through a program during its execution. It has many uses in compilers, but for now, we use it to generate more readable C code.
Using the Data.Graph
module from the containers
package, we write the function to create a control-flow graph for our system/program, and use it to topologically sort the workflows:
type Graph' a =
Graph.Graph, Graph.Vertex -> (a, [a]), a -> Maybe Graph.Vertex)
(
cfGraph :: Map.Map WorkflowName Workflow -> Graph' WorkflowName
=
cfGraph system
graphFromMap. Map.toList
. flip Map.map system
$ \(Workflow _ rules) ->
flip concatMap rules $ \case
AtomicRule (Jump wfName) -> [wfName]
If _ (Jump wfName) -> [wfName]
-> []
_ where
graphFromMap :: (Ord a) => [(a, [a])] -> Graph' a
=
graphFromMap m let (graph, nLookup, vLookup) =
$ map (\(f, ts) -> (f, f, ts)) m
Graph.graphFromEdges in (graph, \v -> let (x, _, xs) = nLookup v in (x, xs), vLookup)
toposortWorkflows :: Map.Map WorkflowName Workflow -> [WorkflowName]
=
toposortWorkflows system let (cfg, nLookup, _) = cfGraph system
in map (fst . nLookup) $ Graph.topSort cfg
Graph'
is a simpler type for a graph of nodes of type a
. The cfGraph
function takes a the map from workflow names to workflows — that is, a system — and returns a control-flow graph of workflow names. It does this by finding jumps from workflows to other workflows, and connecting them.
Then, the toposortWorkflows
function uses the created CFG to topologically sort the workflows. We’ll see this in action in a bit. Moving on to …
The Compiler
The compiler, for now, simply generates the C code for a given system. We write a ToC
typeclass for convenience:
class ToC a where
toC :: a -> String
instance ToC Part where
Part x m a s) =
toC ("{" <> intercalate ", " (map show [x, m, a, s]) <> "}"
instance ToC CmpOp where
= \case
toC LT -> "<"
GT -> ">"
instance ToC Rating where
= \case
toC X -> "x"
M -> "m"
A -> "a"
S -> "s"
instance ToC AtomicRule where
= \case
toC Accept -> "return true;"
Reject -> "return false;"
Jump wfName -> "goto " <> wfName <> ";"
instance ToC Condition where
= \case
toC Comparison rating op val ->
<> " " <> toC op <> " " <> show val
toC rating
instance ToC Rule where
= \case
toC AtomicRule aRule -> toC aRule
If cond aRule ->
"if (" <> toC cond <> ") { " <> toC aRule <> " }"
instance ToC Workflow where
Workflow wfName rules) =
toC (
wfName<> ":\n"
<> intercalate "\n" (map ((" " <>) . toC) rules)
instance ToC System where
System system) =
toC (
intercalate"\n"
"bool runSystem(int x, int m, int a, int s) {",
[ " goto in;",
intercalate"\n"
map (toC . (system Map.!)) $ toposortWorkflows system),
("}"
]
instance ToC Input where
Input system parts) =
toC (
intercalate"\n"
"#include <stdbool.h>",
[ "#include <stdio.h>\n",
toC system,"int main() {",
" int parts[][4] = {",
",\n" (map ((" " <>) . toC) parts),
intercalate " };",
" int totalRating = 0;",
" for(int i = 0; i < " <> show (length parts) <> "; i++) {",
" int x = parts[i][0];",
" int m = parts[i][1];",
" int a = parts[i][2];",
" int s = parts[i][3];",
" if (runSystem(x, m, a, s)) {",
" totalRating += x + m + a + s;",
" }",
" }",
" printf(\"%d\", totalRating);",
" return 0;",
"}"
]
As mentioned before, Accept
and Reject
rules are converted to return true
and false
respectively, and Jump
rules are converted to goto
s. If
rules become if
statements, and Workflow
s become block labels followed by block statements.
A System
is translated to a function runSystem
that takes four parameters, x
, m
, a
and s
, and runs the workflows translated to blocks by executing goto in
.
Finally, an Input
is converted to a C file with the required includes, and a main
function that solves the problem by calling the runSystem
function for all parts.
Let’s throw in a main
function to put everything together.
main :: IO ()
= do
main <- head <$> getArgs
file <- readFile file
code case parse inputParser code of
Right input -> putStrLn $ toC input
Left err -> error err
The main
function reads the input from the file provided as the command line argument, parses it and outputs the generated C code. Let’s run it now.
The Compiler Output
We compile our compiler and run it to generate the C code for the example problem:
$ ghc --make aplenty.hs
$ ./aplenty exinput.txt > aplenty.c
This is the C code it generates:
#include <stdbool.h>
#include <stdio.h>
bool runSystem(int x, int m, int a, int s) {
goto in;
:
inif (s < 1351) { goto px; }
goto qqz;
:
qqzif (s > 2770) { goto qs; }
if (m < 1801) { goto hdj; }
return false;
:
qsif (s > 3448) { return true; }
goto lnx;
:
lnxif (m > 1548) { return true; }
return true;
:
pxif (a < 2006) { goto qkq; }
if (m > 2090) { return true; }
goto rfg;
:
rfgif (s < 537) { goto gd; }
if (x > 2440) { return false; }
return true;
:
qkqif (x < 1416) { return true; }
goto crn;
:
hdjif (m > 838) { return true; }
goto pv;
:
pvif (a > 1716) { return false; }
return true;
:
gdif (a > 3333) { return false; }
return false;
:
crnif (x > 2662) { return true; }
return false;
}
int main() {
int parts[][4] = {
{787, 2655, 1222, 2876},
{1679, 44, 2067, 496},
{2036, 264, 79, 2244},
{2461, 1339, 466, 291},
{2127, 1623, 2188, 1013}
};
int totalRating = 0;
for(int i = 0; i < 5; i++) {
int x = parts[i][0];
int m = parts[i][1];
int a = parts[i][2];
int s = parts[i][3];
if (runSystem(x, m, a, s)) {
+= x + m + a + s;
totalRating }
}
("%d", totalRating);
printfreturn 0;
}
We see the toposortWorkflows
function in action, sorting the blocks in the topological order of jumps between them, as opposed to the original input. Does this work? Only one way to know:
$ gcc aplenty.c -o solution
$ ./solution
19114
Perfect! The solution matches the interpreter output.
The Bonus: Optimizations
By studying the output C code, we spot some possibilities for optimizing the compiler output. Notice how the lnx
block returns same value (true
) regardless of which branch it takes:
:
lnxif (m > 1548) { return true; }
return true;
So, we should be able to replace it with:
:
lnxreturn true;
If we do this, the lnx
block becomes degenerate, and hence the jumps to the block can be inlined, turning the qs
block from:
:
qsif (s > 3448) { return true; }
goto lnx;
to:
:
qsif (s > 3448) { return true; }
return true;
which makes the if
statement in the qs
block redundant as well. Hence, we can repeat the previous optimization and further reduce the generated code.
Another possible optimization is to inline the blocks to which there are only single jumps from the rest of the blocks, for example the qqz
block.
Let’s write these optimizations.
Simplify Workflows
simplifyWorkflows :: System -> System
System system) =
simplifyWorkflows (System $ Map.map simplifyWorkflow system
where
Workflow name rules) =
simplifyWorkflow (Workflow name
$ foldr'
-> case rs of
( \r rs | ruleOutcome r == ruleOutcome r' -> rs
[r'] -> r : rs
_
)last rules]
[$ init rules
= \case
ruleOutcome If _ aRule -> aRule
AtomicRule aRule -> aRule
simplifyWorkflows
goes over all workflows and repeatedly removes the statements from the end of the blocks that has same outcome as the statement previous to them.
Inline Redundant Jumps
inlineRedundantJumps :: System -> System
System system) =
inlineRedundantJumps (System $
flip Map.delete) (Map.map inlineJumps system) $
foldl' (
Map.keys redundantJumpswhere
=
redundantJumps -> let ~(AtomicRule rule) = head $ wRules wf in rule)
Map.map (\wf . Map.filter (\wf -> length (wRules wf) == 1)
$ system
Workflow name rules) =
inlineJumps (Workflow name $ map inlineJump rules
= \case
inlineJump AtomicRule (Jump wfName)
| Map.member wfName redundantJumps ->
AtomicRule $ redundantJumps Map.! wfName
If cond (Jump wfName)
| Map.member wfName redundantJumps ->
If cond $ redundantJumps Map.! wfName
-> rule rule
inlineRedundantJumps
find the jumps to degenerate workflows and inlines them. It does this by first going over all workflows and creating a map of degenerate workflow names to the only rule in them, and then replacing the jumps to such workflows with the only rules.
Remove Jumps
removeJumps :: System -> System
System system) =
removeJumps (let system' =
flip $ Map.adjust removeJumpsWithSingleJumper) system $
foldl' (
toposortWorkflows systemin System
. foldl' (flip Map.delete) system'
. (\\ ["in"])
$ workflowsWithNJumpers 0 system'
where
Workflow name rules) =
removeJumpsWithSingleJumper (Workflow name $
init rules <> case last rules of
AtomicRule (Jump wfName)
| wfName `elem` workflowsWithSingleJumper ->
let (Workflow _ rules') = system Map.! wfName
in rules'
-> [rule]
rule
= workflowsWithNJumpers 1 system
workflowsWithSingleJumper
=
workflowsWithNJumpers n sys let (cfg, nLookup, _) = cfGraph sys
in map (fst . nLookup . fst)
. filter (\(_, d) -> d == n)
. Array.assocs
. Graph.indegree
$ cfg
removeJumps
does two things: first, it finds blocks with only one jumper, and inlines their statements to the jump location. Then it finds blocks to which there are no jumps, and removes them entirely from the program. It uses the workflowsWithNJumpers
helper function that uses the control-flow graph of the system to find all workflows to which there are n
number of jumps, where n
is provided as an input to the function. Note the usage of the toposortWorkflows
function here, which makes sure that we remove the blocks in topological order, accumulating as many statements as possible in the final program.
With these functions in place, we write the optimize
function:
optimize :: System -> System
=
optimize
applyTillUnchanged. inlineRedundantJumps . simplifyWorkflows)
(removeJumps where
applyTillUnchanged :: (Eq a) => (a -> a) -> a -> a
=
applyTillUnchanged f -> if f x == x then x else recurse (f x)) fix (\recurse x
We execute the three optimization functions repeatedly till a fixed point is reached for the resultant System
, that is, till there are no further possibilities of optimization.
Finally, we change our main
function to apply the optimizations:
main :: IO ()
= do
main <- head <$> getArgs
file <- readFile file
code case parse inputParser code of
Right (Input system parts) ->
putStrLn . toC $ Input (optimize system) parts
Left err -> error err
Compiling the optimized compiler and running it as earlier, generates this C code for the runSystem
function now:
bool runSystem(int x, int m, int a, int s) {
goto in;
:
inif (s < 1351) { goto px; }
if (s > 2770) { return true; }
if (m < 1801) { goto hdj; }
return false;
:
pxif (a < 2006) { goto qkq; }
if (m > 2090) { return true; }
if (s < 537) { return false; }
if (x > 2440) { return false; }
return true;
:
qkqif (x < 1416) { return true; }
if (x > 2662) { return true; }
return false;
:
hdjif (m > 838) { return true; }
if (a > 1716) { return false; }
return true;
}
It works well2. We now have 1.7x fewer lines of code as compared to before3.
The Conclusion
This was another attempt to solve Advent of Code problems in somewhat unusual ways. This year we learned some basics of compilation. Swing by next year for more weird ways to solve simple problems.
The full code for this post is available here.
Got suggestions, corrections, or thoughts? Post a comment!
14 comments
Samuel Chase
tlavoie
Abhinav Sarkar
nickgirardo
Abhinav Sarkar
Boojum
ak-coram
Abhinav Sarkar
ak-coram
770grappenmaker
topaz2078
Patzer26
aoanla
Kim Wallmark