Writing a Simple C++ Transpiler using Froley

Writing a Simple C++ Transpiler using Froley

·

10 min read

Overview

In this blog we'll use a compiler development tool called Froley to write a transpiler that takes source code written in a toy programming language called Simple and cross-compiles it to C++ source code.

I'm the developer of Froley. I just released the all-new v3 and want to promote it a bit. I also want to try out blogging on Hashnode here. Why not both?!

This tutorial can be done on Mac, Linux, and Windows.

Test.simple (Transpiler Input)

Here's a sample input to our finished product.

println "Simple Test"

inches = 12
cm = inches * 2.54
println inches " inches = " cm " cm"

Test.cpp (Transpiler Output)

Here's what the corresponding C++ output will look like.

#include <iostream>
#include <string>
using namespace std;

int main()
{
  double inches, cm;
  cout << "Simple Test" << endl;
  inches = 12.0;
  cm = (inches * 2.54);
  cout << inches << " inches = " << cm << " cm" << endl;
  return 0;
}

Install Froley

  1. Install the morlock.sh package manager. This also installs the programming language Rogue and the build tool Rogo.
  2. Run morlock install abepralle/froley

Refer to the Froley Wiki to help clarify Froley syntax and behavior.

Create a New Project

Create a new Froley starter project. From the command line:

  1. mkdir Simple
  2. cd Simple
  3. froley --create
Simple> froley --create
Creating Build.rogue
Creating default .gitignore
Creating Source/Simple.froley
Creating Test.simple

Two of the files that are created are shown below.

Test.simple

This file is auto-generated by froley --create.

println "Simple Test"

inches = 12
cm = inches * 2.54
println inches " inches = " cm " cm"

Source/Simple.froley

This file is auto-generated by froley --create.

================================================================================
# Simple.froley
================================================================================

--------------------------------------------------------------------------------
tokens
--------------------------------------------------------------------------------
EOL(end of line)
IDENTIFIER identifier [content]
NUMBER     number     [content]
STRING     string     [content]

--------------------------------------------------------------------------------
tokens Symbols
--------------------------------------------------------------------------------
SYMBOL_CLOSE_PAREN )  [structural]
SYMBOL_EQUALS      =
SYMBOL_MINUS       -
SYMBOL_OPEN_PAREN  (
SYMBOL_PLUS        +
SYMBOL_SLASH       /
SYMBOL_STAR        *

--------------------------------------------------------------------------------
tokens Keywords
--------------------------------------------------------------------------------
KEYWORD_PRINT   print
KEYWORD_PRINTLN println

--------------------------------------------------------------------------------
scanner
--------------------------------------------------------------------------------
- main
  consume [ \r\t]*               # whitespace
  if consume("#" [^\n]*) restart # single-line comment

  if (not hasAnother) halt
  markPosition

  if (consume('\n')) produce EOL

  match
    produceAny Symbols
  endMatch

  scan_id_or_keyword
  scan_number
  scan_string

  syntaxError

- scan_id_or_keyword
  if (not scan([_a-zA-Z][_a-zA-Z0-9]*)) return
  # 'scan' fills 'buffer' with matching characters

  match buffer
    produceAny Keywords
    others
      produce IDENTIFIER
      # Because type IDENTIFER has [content], 'buffer' will be copied
      # as the resulting Token's .content string.
  endMatch

- scan_number
  if (not hasAnother) return

  if (scan [0-9])
    scan_integer  # First character is now in 'buffer'
    if (scan '.') scan_integer
    produce NUMBER
  elseIf (scan '.')
    scan_integer
    produce NUMBER
  else
    return
  endIf

- scan_integer
  scan [0-9]*  # Fills 'buffer' with remaining digits

- scan_string
  if (not consume('"')) return
  while (hasAnother and not nextIs('\n'))
    ch = read
    if (ch == '"') produce STRING
    elseIf (ch == '\\' and hasAnother) ch = read  # Handle backslash escapes
    collect ch
  endWhile
  syntaxError "Unterminated string."

--------------------------------------------------------------------------------
parser
--------------------------------------------------------------------------------
- ast
  statements
  produce AST(statements)

- statements
  consume_eols

  beginList

  while (hasAnother and not nextHasAttribute(structural))
    statement
    consume_eols
  endWhile

  produceList Statements

- statement
  on "print"   arg_list -> Print(args)
  on "println" arg_list -> Println(args)
  expression

- consume_eols
  while (consume(EOL)) noAction

- arg_list
  beginList
    while (hasAnother and not nextIs(EOL) and not nextHasAttribute(structural))
      expression
    endWhile
  produceList Args

- expression
  assign

- assign [rightBinary]
  on "=" -> Assign

- add_subtract [binary]
  on "+" -> Add
  on "-" -> Subtract

- multiply_divide [binary]
  on "*" -> Multiply
  on "/" -> Divide

- negate [preUnary]
  on "-" -> Negate

- term
  on '(' expression ')': return
  on IDENTIFIER -> Access(name=content)
  on NUMBER     -> Number(value=content:Real)
  on STRING     -> LiteralString(value=content)
  syntaxError

Compiling and Running simple

Now you can type rogo to compile the Simple compiler; it will be run with Test.simple as the input:

Simple> rogo                     
Recompiling /Users/abe/Projects/Misc/Simple/Build.rogue...
roguec --target=C++,Console,macOS .rogo/Build.rogue --api --debug --main --output=.rogo/Build 
Writing .rogo/Build.h...
Writing .rogo/Build.cpp...
SUCCESS (0.21 seconds)
g++ -Wall -std=gnu++11 -fno-strict-aliasing -Wno-invalid-offsetof  .rogo/Build.cpp -o .rogo/Build-macOS 
> froley --main Source/Simple.froley Source
Creating Source/Scanner.rogue
Creating Source/ScannerCore.rogue
Creating Source/Cmd.rogue
Creating Source/Parser.rogue
Creating Source/ParserCore.rogue
Creating Source/Visitor.rogue
Creating Source/CompileError.rogue
Creating Source/ScanTable.rogue
Creating Source/TokenType.rogue
Creating Source/Token.rogue
Creating Source/FrameStack.rogue
Creating Source/Simple.rogue
> roguec Source/Simple.rogue --debug --main --output=Build/Simple-macOS.cpp --target=Console,C++,macOS
Writing Build/Simple-macOS.h...
Writing Build/Simple-macOS.cpp...
SUCCESS (0.18 seconds)
> c++ -O0 -Wall -std=gnu++11 -fno-strict-aliasing -Wno-invalid-offsetof Build/Simple-macOS.cpp -o Build/Simple-macOS
> Build/Simple-macOS Test.simple
ast:AST(Statements[Println(Args[LiteralString(Simple Test)]),Assign(Access(inches),Number(12.0)),Assign(Access(cm),Multiply(Access(inches),Number(2.54))),Println(Args[Access(inches),LiteralString( inches = ),Access(cm),LiteralString( cm)])])

Examining the Process

Inspect the Auto-Generated Compiler Files

Take a look the additional .rogue source files that Froley generated in the Source/ folder and get a feel for the infrastructure that Froley creates.

Scanner Output

The output of the Scanner is a list of Token objects which is not shown on the console output. Here's another copy of the Test.simple source file along with a diagram that illustrates the list of tokens that the Scanner creates.

println "Simple Test"

inches = 12
cm = inches * 2.54
println inches " inches = " cm " cm"

Tokens.png

Parser Output

The output of the Parser is an AST (Abstract Syntax Tree) comprised of extended Cmd nodes. Here is a diagram that illustrates the AST result of parsing Test.simple.

AST.png

Adding the Executable to the Binpath

Type rogo link to utilize Morlock's ability to create a binpath link to any executable - it will execute morlock link <exe-path> simple. Now you can run simple from any folder, e.g. simple Test.simple.

Add Additional Node Types

We'll need a few additional node types for our project. Instead of editing Cmd.rogue and Visitor.rogue by hand, let's write a new routine in the Simple.froley definition that creates the new node types. The routine will never be called but Froley will generate the changes to Cmd.rogue and Visitor.rogue all the same. Add this to the end of Source/Simple.froley:

- additional_nodes
  create WriteVar(name:String,new_value)
  create ReadVar(name:String)

Run rogo to make sure the new changes compile. Nothing new will be shown yet but the new WriteVar and ReadVar node types will have been added to the end of Cmd.rogue and additional code will have been added to the end of Visitor.rogue.

Program.rogue

As we evaluate the AST (Abstract Syntax Tree) that the Parser produces, we'll need a centralized place to store meta-info - variable name and type info in particular.

Our Simple language will support two variable types: Number and String. We'll use implicit typing to figure out what type a variable is but we won't allow dynamic types - a variable can either store a Number or a String but not both.

Let's create a Program singleton that tracks variable info. Add an $include at the top of Source/Simple.rogue:

$include "Program.rogue"

Now create Source/Program.rogue with the following code:

module Simple

class Program [singleton]
  PROPERTIES
    vars = StringTable<<Var>>()

  METHODS
    method set_var_type( t:Token, var_name:String, var_type:String )
      local existing = vars[ var_name ]
      if (existing)
        if (existing.type != var_type)
          throw t.error( "Conflicting types $ and $ for variable '$'."...
          (var_type,vars[var_name],var_name) )
        endIf
      else
        vars[ var_name ] = Var( t, var_name, var_type )
      endIf
endClass

class Var( t:Token, name:String, type:String );

Run rogo to make sure the new changes compile. Nothing new will be shown yet.

CmdTypes.rogue

Let's give each node the ability to return its type as a string. A full-fledged compiler would have a Type class but we'll just use string type names.

We could modify our generated Cmd.rogue file directly without fear of losing our changes, but to keep things simple and clear let's use augmentation to add our node type info in a separate file.

In Source/Simple.rogue, add the following:

$include "CmdTypes.rogue"

Create Source/CmdTypes.rogue:

module Simple

augment Cmd
  METHODS
    method require_type->String
      localize type
      if (type) return type
      throw t.error( "Value required." )

    method require_value->this
      require_type
      return this

    method assigned( new_value:Cmd )->Cmd
      throw t.error( "Invalid assignment target." )

    method type->String
      return null
endAugment

augment Unary
  METHODS
    method type->String
      return operand.type
endAugment

augment Binary
  METHODS
    method type->String
      local left_type = left.type
      local right_type = right.type
      if (not left_type) return right_type
      if (not right_type) return left_type
      if (left_type == "String" or right_type == "String") return "String"
      return "Number"
endAugment

augment LiteralString
  METHODS
    method type->String
      return "String"
endAugment

augment Number
  METHODS
    method type->String
      return "Number"
endAugment

augment ReadVar
  METHODS
    method type->String
      local v = Program.vars[name]
      if (not v) return null
      return v.type
endAugment

augment WriteVar
  METHODS
    method type->String
      local v = Program.vars[name]
      if (not v) return null
      return v.type
endAugment

augment Access
  METHODS
    method assigned( new_value:Cmd )->Cmd
      return WriteVar( t, name, new_value )
endAugment

Note we also include a mechanism for a general variable Access node to be transformed into a WriteVar node when it is assigned a value. The assigned() method will be utilized in our upcoming Resolver.

Run rogo to make sure the new changes compile. Nothing new will be shown yet.

Resolver.rogue

An AST is often analyzed with multiple passes to collect information and to organize, resolve, and validate nodes. In this case we can get away with a single "resolve" pass. We'll extend the Froley-generated Visitor to do it. Refer to the Visitor API for additional details on the following.

Make the following two modifications to Source/Simple.rogue:

$include "Resolver.rogue"  # add this next to the existing include's.
...
    # find this existing method.
    method parse_files
      forEach (arg in command//args)
        local filepath = arg->String
        local parser = Parser( File(filepath) )
        local ast = parser.parse
        Resolver().visit( ast )  # add this
        @trace ast
      endForEach

Define Source/Resolver.rogue as follows:

module Simple

class Resolver : Visitor
  METHODS
    method on( cmd:Access )->Cmd
      return visit( ReadVar(cmd.t,cmd.name) )

    method on_leave( cmd:Args )
      (forEach in cmd).require_value

    method on( cmd:Assign )->Cmd
      return visit( cmd.left.assigned(cmd.right) )

    method on_leave( cmd:Binary )
      cmd.left.require_value
      cmd.right.require_value

    method on_leave( cmd:Unary )
      cmd.operand.require_value

    method on_leave( cmd:Print )
      if (cmd.args.count == 0)
        throw cmd.t.error( "One or more values expected." )
      endIf

    method on_leave( cmd:WriteVar )
      Program.set_var_type( cmd.t, cmd.name, cmd.new_value.require_type )

endClass

Run rogo to build and run the new changes. You'll see a slightly different AST than before, where every Assign+Access has been replaced with a WriteVar and every other Access has been replaced with ReadVar:

> Build/Simple-macOS Test.simple
ast:AST(Statements[Println(Args[LiteralString(Simple Test)]),WriteVar(inches,Number(12.0)),WriteVar(cm,Multiply(ReadVar(inches),Number(2.54))),Println(Args[ReadVar(inches),LiteralString( inches = ),ReadVar(cm),LiteralString( cm)]),Println(Args[Add(Number(3.0),Number(4.0))]),Println(Args[Subtract(Number(3.0),Number(4.0))]),Println(Args[Multiply(Number(3.0),Number(4.0))]),Println(Args[Divide(Number(3.0),Number(4.0))])])

Here is an updated AST diagram that reflects the changes made in the Resolver.

Resolved-AST.png

CPPWriter.rogue

We've resolved the AST and we've collected variable type info. The final step is to write out the AST in C++ format. The same principles apply to outputting any kind of language, including Machine Language and Assembly.

Let's make another extended Visitor to traverse the tree and write out appropriate C++ for each node.

Make the following modifications to Source/Simple.rogue:

$include "CPPWriter.rogue"  # add this next to the existing include's.
...
    # find this existing method.
    method parse_files
      forEach (arg in command//args)
        local filepath = arg->String
        local parser = Parser( File(filepath) )
        local ast = parser.parse
        Resolver().visit( ast )
        #@trace ast  # comment out this line

        # Add these three lines.
        local cpp_filepath = filepath.before_last('.') + ".cpp"
        println "$ -> $"(filepath,cpp_filepath)
        CPPWriter(cpp_filepath).visit( ast )
      endForEach

Define Source/CPPWriter.rogue as follows:

module Simple

class CPPWriter( filepath:String ) : Visitor
  PROPERTIES
    builder = StringBuilder()

  METHODS
    method on_enter( cmd:AST )
      builder.println ...
        @|#include <iostream>
         |#include <string>
         |using namespace std;
         |
         |int main()
         |{
      builder.indent += 2

      # Declare variables
      local number_vars = Program.vars.values.to_list.keeping($.type == "Number")
      if (number_vars.count)
        builder.print "double "
        forEach (v at i in number_vars)
          if (i > 0) builder.print ", "
          builder.print v.name
        endForEach
        builder.println ";"
      endIf

      local string_vars = Program.vars.values.to_list.keeping($.type == "String ")
      if (string_vars.count)
        builder.print "string "
        forEach (v at i in string_vars)
          if (i > 0) builder.print ", "
          builder.print v.name
        endForEach
        builder.println ";"
      endIf

    method on_leave( cmd:AST )
      builder.indent -= 2
      builder.println ...
        @|  return 0;
         |}

      File.save( filepath, builder )

    method on_visit( cmd:Statements )
      forEach (statement in cmd)
        visit( statement )
        builder.println ";"
      endForEach

    method on_visit( cmd:Binary )
      builder.print '('
      visit( cmd.left )
      builder.print " $ "(cmd.t.type.symbol)
      visit( cmd.right )
      builder.print ')'

    method on_enter( cmd:Negate )
      builder.print '-'

    method on_enter( cmd:Number )
      builder.print cmd.value

    method on_enter( cmd:LiteralString )
      builder.print '"$"'(cmd.value.to_escaped_ascii('"'))

    method on_visit( cmd:Print )
      builder.print "cout"
      forEach (arg in cmd.args)
        builder.print( " << " )
        visit( arg )
      endForEach

    method on_visit( cmd:Println )
      if (cmd.args.count)
        on_visit( cmd as Print )
        builder.print " << endl"
      else
        builder.print "cout << endl"
      endIf

    method on_enter( cmd:ReadVar )
      builder.print cmd.name

    method on_enter( cmd:WriteVar )
      builder.print( cmd.name ).print( " = " )

endClass

Run rogo one last time and voilà! We get the compilable C++ output shown at the top of this tutorial.

The full set of finished code for this tutorial is available as Example 6 in the Froley repo.