Overview
In this blog we'll use a compiler development tool called Froley to write a transpiler that takes source code written in a toy programming language called Simple and cross-compiles it to C++ source code.
I'm the developer of Froley. I just released the all-new v3 and want to promote it a bit. I also want to try out blogging on Hashnode here. Why not both?!
This tutorial can be done on Mac, Linux, and Windows.
Test.simple (Transpiler Input)
Here's a sample input to our finished product.
println "Simple Test"
inches = 12
cm = inches * 2.54
println inches " inches = " cm " cm"
Test.cpp (Transpiler Output)
Here's what the corresponding C++ output will look like.
#include <iostream>
#include <string>
using namespace std;
int main()
{
double inches, cm;
cout << "Simple Test" << endl;
inches = 12.0;
cm = (inches * 2.54);
cout << inches << " inches = " << cm << " cm" << endl;
return 0;
}
Install Froley
- Install the morlock.sh package manager. This also installs the programming language Rogue and the build tool Rogo.
- Run
morlock install abepralle/froley
Refer to the Froley Wiki to help clarify Froley syntax and behavior.
Create a New Project
Create a new Froley starter project. From the command line:
mkdir Simple
cd Simple
froley --create
Simple> froley --create
Creating Build.rogue
Creating default .gitignore
Creating Source/Simple.froley
Creating Test.simple
Two of the files that are created are shown below.
Test.simple
This file is auto-generated by froley --create
.
println "Simple Test"
inches = 12
cm = inches * 2.54
println inches " inches = " cm " cm"
Source/Simple.froley
This file is auto-generated by froley --create
.
================================================================================
# Simple.froley
================================================================================
--------------------------------------------------------------------------------
tokens
--------------------------------------------------------------------------------
EOL(end of line)
IDENTIFIER identifier [content]
NUMBER number [content]
STRING string [content]
--------------------------------------------------------------------------------
tokens Symbols
--------------------------------------------------------------------------------
SYMBOL_CLOSE_PAREN ) [structural]
SYMBOL_EQUALS =
SYMBOL_MINUS -
SYMBOL_OPEN_PAREN (
SYMBOL_PLUS +
SYMBOL_SLASH /
SYMBOL_STAR *
--------------------------------------------------------------------------------
tokens Keywords
--------------------------------------------------------------------------------
KEYWORD_PRINT print
KEYWORD_PRINTLN println
--------------------------------------------------------------------------------
scanner
--------------------------------------------------------------------------------
- main
consume [ \r\t]* # whitespace
if consume("#" [^\n]*) restart # single-line comment
if (not hasAnother) halt
markPosition
if (consume('\n')) produce EOL
match
produceAny Symbols
endMatch
scan_id_or_keyword
scan_number
scan_string
syntaxError
- scan_id_or_keyword
if (not scan([_a-zA-Z][_a-zA-Z0-9]*)) return
# 'scan' fills 'buffer' with matching characters
match buffer
produceAny Keywords
others
produce IDENTIFIER
# Because type IDENTIFER has [content], 'buffer' will be copied
# as the resulting Token's .content string.
endMatch
- scan_number
if (not hasAnother) return
if (scan [0-9])
scan_integer # First character is now in 'buffer'
if (scan '.') scan_integer
produce NUMBER
elseIf (scan '.')
scan_integer
produce NUMBER
else
return
endIf
- scan_integer
scan [0-9]* # Fills 'buffer' with remaining digits
- scan_string
if (not consume('"')) return
while (hasAnother and not nextIs('\n'))
ch = read
if (ch == '"') produce STRING
elseIf (ch == '\\' and hasAnother) ch = read # Handle backslash escapes
collect ch
endWhile
syntaxError "Unterminated string."
--------------------------------------------------------------------------------
parser
--------------------------------------------------------------------------------
- ast
statements
produce AST(statements)
- statements
consume_eols
beginList
while (hasAnother and not nextHasAttribute(structural))
statement
consume_eols
endWhile
produceList Statements
- statement
on "print" arg_list -> Print(args)
on "println" arg_list -> Println(args)
expression
- consume_eols
while (consume(EOL)) noAction
- arg_list
beginList
while (hasAnother and not nextIs(EOL) and not nextHasAttribute(structural))
expression
endWhile
produceList Args
- expression
assign
- assign [rightBinary]
on "=" -> Assign
- add_subtract [binary]
on "+" -> Add
on "-" -> Subtract
- multiply_divide [binary]
on "*" -> Multiply
on "/" -> Divide
- negate [preUnary]
on "-" -> Negate
- term
on '(' expression ')': return
on IDENTIFIER -> Access(name=content)
on NUMBER -> Number(value=content:Real)
on STRING -> LiteralString(value=content)
syntaxError
Compiling and Running simple
Now you can type rogo
to compile the Simple compiler; it will be run with Test.simple
as the input:
Simple> rogo
Recompiling /Users/abe/Projects/Misc/Simple/Build.rogue...
roguec --target=C++,Console,macOS .rogo/Build.rogue --api --debug --main --output=.rogo/Build
Writing .rogo/Build.h...
Writing .rogo/Build.cpp...
SUCCESS (0.21 seconds)
g++ -Wall -std=gnu++11 -fno-strict-aliasing -Wno-invalid-offsetof .rogo/Build.cpp -o .rogo/Build-macOS
> froley --main Source/Simple.froley Source
Creating Source/Scanner.rogue
Creating Source/ScannerCore.rogue
Creating Source/Cmd.rogue
Creating Source/Parser.rogue
Creating Source/ParserCore.rogue
Creating Source/Visitor.rogue
Creating Source/CompileError.rogue
Creating Source/ScanTable.rogue
Creating Source/TokenType.rogue
Creating Source/Token.rogue
Creating Source/FrameStack.rogue
Creating Source/Simple.rogue
> roguec Source/Simple.rogue --debug --main --output=Build/Simple-macOS.cpp --target=Console,C++,macOS
Writing Build/Simple-macOS.h...
Writing Build/Simple-macOS.cpp...
SUCCESS (0.18 seconds)
> c++ -O0 -Wall -std=gnu++11 -fno-strict-aliasing -Wno-invalid-offsetof Build/Simple-macOS.cpp -o Build/Simple-macOS
> Build/Simple-macOS Test.simple
ast:AST(Statements[Println(Args[LiteralString(Simple Test)]),Assign(Access(inches),Number(12.0)),Assign(Access(cm),Multiply(Access(inches),Number(2.54))),Println(Args[Access(inches),LiteralString( inches = ),Access(cm),LiteralString( cm)])])
Examining the Process
Inspect the Auto-Generated Compiler Files
Take a look the additional .rogue
source files that Froley generated in the Source/
folder and get a feel for the infrastructure that Froley creates.
Scanner Output
The output of the Scanner is a list of Token
objects which is not shown on the console output. Here's another copy of the Test.simple
source file along with a diagram that illustrates the list of tokens that the Scanner creates.
println "Simple Test"
inches = 12
cm = inches * 2.54
println inches " inches = " cm " cm"
Parser Output
The output of the Parser is an AST (Abstract Syntax Tree) comprised of extended Cmd
nodes. Here is a diagram that illustrates the AST result of parsing Test.simple
.
Adding the Executable to the Binpath
Type rogo link
to utilize Morlock's ability to create a binpath link to any executable - it will execute morlock link <exe-path> simple
. Now you can run simple
from any folder, e.g. simple Test.simple
.
Add Additional Node Types
We'll need a few additional node types for our project. Instead of editing Cmd.rogue
and Visitor.rogue
by hand, let's write a new routine in the Simple.froley
definition that creates the new node types. The routine will never be called but Froley will generate the changes to Cmd.rogue
and Visitor.rogue
all the same. Add this to the end of Source/Simple.froley
:
- additional_nodes
create WriteVar(name:String,new_value)
create ReadVar(name:String)
Run rogo
to make sure the new changes compile. Nothing new will be shown yet but the new WriteVar
and ReadVar
node types will have been added to the end of Cmd.rogue
and additional code will have been added to the end of Visitor.rogue
.
Program.rogue
As we evaluate the AST (Abstract Syntax Tree) that the Parser produces, we'll need a centralized place to store meta-info - variable name and type info in particular.
Our Simple language will support two variable types: Number
and String
. We'll use implicit typing to figure out what type a variable is but we won't allow dynamic types - a variable can either store a Number
or a String
but not both.
Let's create a Program
singleton that tracks variable info. Add an $include
at the top of Source/Simple.rogue
:
$include "Program.rogue"
Now create Source/Program.rogue
with the following code:
module Simple
class Program [singleton]
PROPERTIES
vars = StringTable<<Var>>()
METHODS
method set_var_type( t:Token, var_name:String, var_type:String )
local existing = vars[ var_name ]
if (existing)
if (existing.type != var_type)
throw t.error( "Conflicting types $ and $ for variable '$'."...
(var_type,vars[var_name],var_name) )
endIf
else
vars[ var_name ] = Var( t, var_name, var_type )
endIf
endClass
class Var( t:Token, name:String, type:String );
Run rogo
to make sure the new changes compile. Nothing new will be shown yet.
CmdTypes.rogue
Let's give each node the ability to return its type as a string. A full-fledged compiler would have a Type
class but we'll just use string type names.
We could modify our generated Cmd.rogue
file directly without fear of losing our changes, but to keep things simple and clear let's use augmentation to add our node type info in a separate file.
In Source/Simple.rogue
, add the following:
$include "CmdTypes.rogue"
Create Source/CmdTypes.rogue
:
module Simple
augment Cmd
METHODS
method require_type->String
localize type
if (type) return type
throw t.error( "Value required." )
method require_value->this
require_type
return this
method assigned( new_value:Cmd )->Cmd
throw t.error( "Invalid assignment target." )
method type->String
return null
endAugment
augment Unary
METHODS
method type->String
return operand.type
endAugment
augment Binary
METHODS
method type->String
local left_type = left.type
local right_type = right.type
if (not left_type) return right_type
if (not right_type) return left_type
if (left_type == "String" or right_type == "String") return "String"
return "Number"
endAugment
augment LiteralString
METHODS
method type->String
return "String"
endAugment
augment Number
METHODS
method type->String
return "Number"
endAugment
augment ReadVar
METHODS
method type->String
local v = Program.vars[name]
if (not v) return null
return v.type
endAugment
augment WriteVar
METHODS
method type->String
local v = Program.vars[name]
if (not v) return null
return v.type
endAugment
augment Access
METHODS
method assigned( new_value:Cmd )->Cmd
return WriteVar( t, name, new_value )
endAugment
Note we also include a mechanism for a general variable Access
node to be transformed into a WriteVar
node when it is assigned a value. The assigned()
method will be utilized in our upcoming Resolver
.
Run rogo
to make sure the new changes compile. Nothing new will be shown yet.
Resolver.rogue
An AST is often analyzed with multiple passes to collect information and to organize, resolve, and validate nodes. In this case we can get away with a single "resolve" pass. We'll extend the Froley-generated Visitor
to do it. Refer to the Visitor API for additional details on the following.
Make the following two modifications to Source/Simple.rogue
:
$include "Resolver.rogue" # add this next to the existing include's.
...
# find this existing method.
method parse_files
forEach (arg in command//args)
local filepath = arg->String
local parser = Parser( File(filepath) )
local ast = parser.parse
Resolver().visit( ast ) # add this
@trace ast
endForEach
Define Source/Resolver.rogue
as follows:
module Simple
class Resolver : Visitor
METHODS
method on( cmd:Access )->Cmd
return visit( ReadVar(cmd.t,cmd.name) )
method on_leave( cmd:Args )
(forEach in cmd).require_value
method on( cmd:Assign )->Cmd
return visit( cmd.left.assigned(cmd.right) )
method on_leave( cmd:Binary )
cmd.left.require_value
cmd.right.require_value
method on_leave( cmd:Unary )
cmd.operand.require_value
method on_leave( cmd:Print )
if (cmd.args.count == 0)
throw cmd.t.error( "One or more values expected." )
endIf
method on_leave( cmd:WriteVar )
Program.set_var_type( cmd.t, cmd.name, cmd.new_value.require_type )
endClass
Run rogo
to build and run the new changes. You'll see a slightly different AST than before, where every Assign
+Access
has been replaced with a WriteVar
and every other Access
has been replaced with ReadVar
:
> Build/Simple-macOS Test.simple
ast:AST(Statements[Println(Args[LiteralString(Simple Test)]),WriteVar(inches,Number(12.0)),WriteVar(cm,Multiply(ReadVar(inches),Number(2.54))),Println(Args[ReadVar(inches),LiteralString( inches = ),ReadVar(cm),LiteralString( cm)]),Println(Args[Add(Number(3.0),Number(4.0))]),Println(Args[Subtract(Number(3.0),Number(4.0))]),Println(Args[Multiply(Number(3.0),Number(4.0))]),Println(Args[Divide(Number(3.0),Number(4.0))])])
Here is an updated AST diagram that reflects the changes made in the Resolver.
CPPWriter.rogue
We've resolved the AST and we've collected variable type info. The final step is to write out the AST in C++ format. The same principles apply to outputting any kind of language, including Machine Language and Assembly.
Let's make another extended Visitor
to traverse the tree and write out appropriate C++ for each node.
Make the following modifications to Source/Simple.rogue
:
$include "CPPWriter.rogue" # add this next to the existing include's.
...
# find this existing method.
method parse_files
forEach (arg in command//args)
local filepath = arg->String
local parser = Parser( File(filepath) )
local ast = parser.parse
Resolver().visit( ast )
#@trace ast # comment out this line
# Add these three lines.
local cpp_filepath = filepath.before_last('.') + ".cpp"
println "$ -> $"(filepath,cpp_filepath)
CPPWriter(cpp_filepath).visit( ast )
endForEach
Define Source/CPPWriter.rogue
as follows:
module Simple
class CPPWriter( filepath:String ) : Visitor
PROPERTIES
builder = StringBuilder()
METHODS
method on_enter( cmd:AST )
builder.println ...
@|#include <iostream>
|#include <string>
|using namespace std;
|
|int main()
|{
builder.indent += 2
# Declare variables
local number_vars = Program.vars.values.to_list.keeping($.type == "Number")
if (number_vars.count)
builder.print "double "
forEach (v at i in number_vars)
if (i > 0) builder.print ", "
builder.print v.name
endForEach
builder.println ";"
endIf
local string_vars = Program.vars.values.to_list.keeping($.type == "String ")
if (string_vars.count)
builder.print "string "
forEach (v at i in string_vars)
if (i > 0) builder.print ", "
builder.print v.name
endForEach
builder.println ";"
endIf
method on_leave( cmd:AST )
builder.indent -= 2
builder.println ...
@| return 0;
|}
File.save( filepath, builder )
method on_visit( cmd:Statements )
forEach (statement in cmd)
visit( statement )
builder.println ";"
endForEach
method on_visit( cmd:Binary )
builder.print '('
visit( cmd.left )
builder.print " $ "(cmd.t.type.symbol)
visit( cmd.right )
builder.print ')'
method on_enter( cmd:Negate )
builder.print '-'
method on_enter( cmd:Number )
builder.print cmd.value
method on_enter( cmd:LiteralString )
builder.print '"$"'(cmd.value.to_escaped_ascii('"'))
method on_visit( cmd:Print )
builder.print "cout"
forEach (arg in cmd.args)
builder.print( " << " )
visit( arg )
endForEach
method on_visit( cmd:Println )
if (cmd.args.count)
on_visit( cmd as Print )
builder.print " << endl"
else
builder.print "cout << endl"
endIf
method on_enter( cmd:ReadVar )
builder.print cmd.name
method on_enter( cmd:WriteVar )
builder.print( cmd.name ).print( " = " )
endClass
Run rogo
one last time and voilà! We get the compilable C++ output shown at the top of this tutorial.
The full set of finished code for this tutorial is available as Example 6 in the Froley repo.