I always wanted to create my own programming language.

It might be one of the programming projects with the highest floor-to-ceiling ratio: a simple interpreter can take up less than a hundred lines of code, but adding features and improving the speed quickly increases the complexity, until it reaches the heights of industrial strength compilers like gcc, with millions of lines of code.

At around 7k lines of code this project lies somewhere inbetween (at least on a logarithmic scale).

The language is a Lisp1 modeled after Scheme and runs on a simple bytecode-VM, so that it is possible to write non-trivial programs and have them run at a reasonable speed.

(defn fib (n)
(if (<= n 1)
n
(+ (fib (- n 1))
(fib (- n 2)))))

(println (fib 30))


Of course this is a pretty meaningless benchmark, but on my machine the recursive fibonacci fuction above takes 1.1s to execute in EulerLisp which gets makes it between 2x and 10x slower than a few other languages I tested:

• Ruby, 0.11s
• Python, 0.23s
• Elixir, 0.48s
• Chicken Scheme (interpreted), 0.68s
• Chicken Scheme (compiled), 0.07s
• C (gcc -O3), 0.08s

## Motivation

As an incentive to make the language fast and usable, I’m using it to work through some Project Euler math problems.

Up till now, I’ve solved around 200 of the problems and written around 7k lines of EulerLisp code.

TODO: List of favorite problems

## Data Types

The main data type is called Datum and has different variants:

• Bool
• Integer (64bit signed)
• Bignum (multiple precision)
• Rational (64bit numerator and denominator)
• Float (64bit)
• Char
• String
• Symbol (pointer into a symbol table)
• Vector
• Builtin (reference to a function implemented in rust)
• Closure (reference to a function in the target language)
• PriorityQueue (for performance reasons, should be implemented in lisp)
• Undefined
• Nil

## VM

EulerLisp runs on a stack-based bytecode VM.

### Structure

The VM has one register for storing values and one for storing function references.

There are three kinds of stacks (to avoid wrapping all values in the Datum type).

1. Main value stack
2. Environment reference stack
3. Program counter stack

Arguments are passed to functions through a Vec<Datum> called a “frame”.

### Instructions

Because the main usecase is solving math problems, a lot of the instructions are math related.

TODO: Describe instructions

#### Number Manipulation

• INC, DEC
• ADD, SUB, MUL, DIV, MOD, INTDIV

#### Cons-Cell Manipulation

Note: I’m using fst (first) and rst (rest) instead of the dated car and cdr. cadr can be written as frst.

• CONS
• FST,RST

#### Logic, Equality, Comparison, Predicates

• NOT
• EQUAL, EQ, NEQ
• GT, GTE, LT, LTE
• ISZERO, ISNIL

#### Vector Manipulation

• VECTORREF
• VECTORSET

#### Register / Stack Manipulation

• PUSHVALUE
• POPFUNCTION
• CONSTANT
• PUSHCONSTANT

#### Variable References

There are three types of variable references, global (in the outmost scope), shallow (in the current environment) or deep (in some parent of the current environment.

For each of these types, there is an instruction to load it into the main register, push it on the stack or set it to the value of the main register.

• CHECKEDGLOBALREF
• GLOBALREF
• PUSHCHECKEDGLOBALREF
• PUSHGLOBALREF
• GLOBALSET
• SHALLOWARGUMENTREF
• PUSHSHALLOWARGUMENTREF
• SHALLOWARGUMENTSET
• DEEPARGUMENTREF
• PUSHDEEPARGUMENTREF
• DEEPARGUMENTSET

#### Jumps

Two types of jumps would suffice (conditional & unconditional), the other ones exists because I wanted to play around with peephole-optimization.

• JUMP
• JUMPFALSE
• JUMPTRUE
• JUMPNIL
• JUMPNOTNIL
• JUMPZERO
• JUMPNOTZERO

#### Closures & Function Calls

Closures can accept either a fixed (fix closures) or a variable number of arguments (dotted closures).

Most functions take between zero and three arguments, to increase efficiency they have special call instructions.

• FIXCLOSURE
• DOTTEDCLOSURE
• STOREARGUMENT
• CONSARGUMENT
• FUNCTIONINVOKE
• PRESERVEENV
• RESTOREENV
• EXTENDENV
• UNLINKENV
• CALL1
• CALL2
• CALL3
• CALLN

#### Function Calls & Frames

• ALLOCATEFRAME
• ALLOCATEDOTTEDFRAME
• ALLOCATEFILLFRAME

#### Exit

• RETURN
• FINISH

## Resources

If this inspired you to start building your own programming language, here are some resources that helped me:

## TODO

### Garbage Collection

Currently I’m using rusts reference counted pointers Rc and RefCell as a substitute for implementing my own garbage collection.

This works fine in most cases, but because Scheme allows creating circular lists it it possible to create objects that are not reachable by some root but have non-zero reference counts.

(defn fill-memory ()
(let ((circular (cons 1 2)))
(set-rst! circular circular))
(fill-memory))

(fill-memory)


In the future it would be nice to write my own simple garbage collector.

Source code on Github: l3kn/rust_lisp