I always wanted to create my own programming language.
It might be one of the programming projects with the highest floor-to-ceiling ratio: a simple interpreter can take up less than a hundred lines of code, but adding features and improving the speed quickly increases the complexity, until it reaches the heights of industrial strength compilers like gcc, with millions of lines of code.
At around 7k lines of code this project lies somewhere inbetween (at least on a logarithmic scale).
The language is a Lisp1 modeled after Scheme and runs on a simple bytecode-VM, so that it is possible to write non-trivial programs and have them run at a reasonable speed.
(defn fib (n) (if (<= n 1) n (+ (fib (- n 1)) (fib (- n 2))))) (println (fib 30))
Of course this is a pretty meaningless benchmark, but on my machine the recursive fibonacci fuction above takes 1.1s to execute in EulerLisp which gets makes it between 2x and 10x slower than a few other languages I tested:
- Ruby, 0.11s
- Python, 0.23s
- Elixir, 0.48s
- Chicken Scheme (interpreted), 0.68s
- Chicken Scheme (compiled), 0.07s
- C (gcc -O3), 0.08s
As an incentive to make the language fast and usable, I’m using it to work through some Project Euler math problems.
Up till now, I’ve solved around 200 of the problems and written around 7k lines of EulerLisp code.
TODO: List of favorite problems
The main data type is called
and has different variants:
Rational(64bit numerator and denominator)
Symbol(pointer into a symbol table)
Builtin(reference to a function implemented in rust)
Closure(reference to a function in the target language)
PriorityQueue(for performance reasons, should be implemented in lisp)
EulerLisp runs on a stack-based bytecode VM.
The VM has one register for storing values and one for storing function references.
There are three kinds of stacks
(to avoid wrapping all values in the
- Main value stack
- Environment reference stack
- Program counter stack
Arguments are passed to functions through
Vec<Datum> called a “frame”.
Because the main usecase is solving math problems, a lot of the instructions are math related.
TODO: Describe instructions
Note: I’m using
fst (first) and
instead of the dated
cadr can be written as
Logic, Equality, Comparison, Predicates
Register / Stack Manipulation
There are three types of variable references, global (in the outmost scope), shallow (in the current environment) or deep (in some parent of the current environment.
For each of these types, there is an instruction to load it into the main register, push it on the stack or set it to the value of the main register.
Two types of jumps would suffice (conditional & unconditional), the other ones exists because I wanted to play around with peephole-optimization.
Closures & Function Calls
Closures can accept either a fixed (fix closures) or a variable number of arguments (dotted closures).
Most functions take between zero and three arguments, to increase efficiency they have special call instructions.
Function Calls & Frames
If this inspired you to start building your own programming language, here are some resources that helped me:
Algorithms & Data Structures
Currently I’m using rusts reference counted pointers
RefCell as a substitute for implementing my own garbage collection.
This works fine in most cases, but because Scheme allows creating circular lists it it possible to create objects that are not reachable by some root but have non-zero reference counts.
(defn fill-memory () (let ((circular (cons 1 2))) (set-rst! circular circular)) (fill-memory)) (fill-memory)
In the future it would be nice to write my own simple garbage collector.
Source code on Github: l3kn/rust_lisp