Looking back on Babble, my best attempt at a programming language yet

_2024-01-09

In late 2022, I built an interpreted programming language called Babble.

I’ve always been rather interested in language design. I’ve fallen in love with the fluidity and elegance of Ruby, and have recently been appreciating the type system and static guarantees of Rust for embedded systems work.

But even with several attempts over the years, I’d never implemented a language myself which I’d been satisfied with. Implementing each part of a compiler or interpreter isn’t an easy task!

Babble is the first language I’ve created where I’m rather happy with the overall result, with the implementation feeling reasonably polished and usable. I even solved about half of Advent of Code 2022 with it, which felt like a significant accomplishment by reaching a stage where the language was usable to solve problems expressively.

In this post, I want to reflect on the design of the language - what worked well, and what didn’t.

I’ll be focusing mostly on the language, rather than the implementation, as that’s where the interesting decisions lie - but as we’ll discuss later, the implementation does play a key role in the real-world usability of the language!

A tour of some Babble code

To build up some context, let’s get straight into what Babble looks like. Here’s some code to define a factorial function, then call it and print the result:

impl Integer {
    func factorial {
        self lessThanOrEquals: 1 $
            ifTrue: [ 1 ]
            else:   [ self * (self - 1) factorial ]
    }
}

Console println: 7 factorial.

Let’s walk through this bit-by-bit.

impl Integer {
    func factorial {

All types in Babble are open - you can extend them with new methods whenever you like. Rather than defining our factorial method in some new module, we’re extending the existing Integer type to include this method, using an impl construct.

        self lessThanOrEquals: 1 $

This is where the syntax may become a little unfamiliar, if you’re used to C-like languages. The syntax of Babble is inspired by Smalltalk, where “messages” (its equivalent of method calls) are composed of lists of keywords and corresponding arguments. (Babble’s name is trying to riff off Smalltalk, too!)

Methods in Smalltalk and Babble don’t have traditional method names. Instead, each method is identified by its named and ordered parameters. If we wanted to talk about this method, we’d probably call it lessThanOrEquals:, which is the name of its first and only parameter.

The $ at the end of the line allows us to chain a second method call afterwards:

            ifTrue: [ 1 ]
            else:   [ self * (self - 1) factorial ]

Here, we call a method on the Boolean returned by lessThanOrEquals:. This method has two parameters, ifTrue: and else:. Both parameter values are blocks, anonymous functions which can capture their environment.

If the boolean is true, then the ifTrue:else: method implementation will call the ifTrue: block, otherwise it’ll call the else: block.

This is a core aspect of Smalltalk, and by proxy Babble - control flow is implemented using methods, rather than specific language constructs.

Imagine if you wrote an if-statement in Python like this:

(self <= 1).if(
    true = lambda: 1
    else = lambda: self * factorial(self - 1)
)

Python’s syntax doesn’t lend itself well to this, in my opinion. But Smalltalk and Babble’s comparatively lightweight and terse notation helps method-driven control flow read quite nicely.

What do I like about Babble?

Mandatory keyword arguments

I’m a big fan of keyword arguments in languages. Code is read many more times than it’s written, and I find calls with explicitly-labelled arguments much easier to read - especially when your understanding of the wider codebase is limited.

This is one of the reasons why I’m so fond of Smalltalk’s syntax, where every method definition is uniquely identified by its keyword arguments. Borrowing this was one of my earliest design decisions for Babble.

I considered using the syntax seen in Swift and Ruby, which still use a traditional method name, followed by labelled arguments in parentheses. However, I wanted to require keyword arguments in all cases. If the syntax still requires a separate method name, then there could be awkward duplication between the method name and the first argument label. As an example, imagine a substring replacement function:

"foo bar baz" replace: "bar" with: "qux"   // Babble's approach

// Possible alternative, but there's no good name for this argument
//                    vvv
"foo bar baz" replace(str: "bar" with: "qux")

This method calling system also gives you optional arguments for free, because you can define additional methods which simply omit some of the keywords - ifTrue: and ifTrue:else: both exist as separate methods, depending on whether a conditional needs a falsey case.

Lightweight syntax

Because you don’t need parentheses to call a method, Babble’s syntax ends up being relatively light on symbols.

Chains of unary methods (those which don’t take any parameters) look particularly nice, in my opinion:

number = Console input strip toInteger.

The concise block syntax with [ ... ] is also an important aspect of making several aspects of the language look nice, particularly the functional programming utilities and control flow.

// Read a list of comma-separated numbers as input, and find the 3 largest
Console input split: ","
    $ map: [ |n| n strip toInteger ]
    $ sort reverse take: 3

Type model

Babble isn’t object-oriented, a deviation from Smalltalk as the source of inspiration. Instead, new data types are defined as product and sum types, called struct and enum respectively to match Rust’s intuitive terminology.

// Struct - has a fixed set of fields
struct Person name age.

// Enum - many variants with different fields
enum Shape {
    Rectangle width height.
    Circle radius.
    Square length.
}

I’ve come to prefer this type model over time compared to OOP. In Babble, deduplication of code between similar types is achieved using a mixin - you can implement methods on a mixin, and then use it in any number of other types. Mixins can’t define new data fields though, distinguishing them from a traditional OOP class hierarchy.

Complementing these types are specialised blocks called match blocks, which bake a pattern-matching system directly into a block’s parameter list. This implements pattern-matching without a dedicated control structure like match or switch. Match blocks wrap their result in an enum called Match, returning a Hit variant if the pattern is matched and the block is executed, or bailing immediately with a Miss if the pattern doesn’t match.

// ? indicates a match block
rectAreaCalc = ?[ | Shape#Rectangle width: w height: h | w * h ].

rectAreaCalc call: (Shape#Rectangle width: 4 height: 5). // => Match#Hit value: 20
rectAreaCalc call: (Shape#Circle radius: 10).            // Match#Miss

What don’t I like?

Precedence sadness

In the middle of all of this pleasant method syntax is an ugly wart: the $ operator.

Because there are no parentheses around method call arguments, it isn’t possible to deduce where one call ends and the next one starts:

array map: [ ... ] filter: [ ... ] reduce: [ ... ] acc: 0

As written, Babble would think we were trying to call a four-argument method named map:filter:reduce:acc:. But in fact, we want to call map:, then filter:, then reduce:acc:.

You can add parentheses around each individual call, but this undoes some of the syntactic elegance:

((array map: [ ... ]) filter: [ ... ]) reduce: [ ... ] acc: 0

My solution was the $ operator, which effectively breaks one method call to start another:

array map: [ ... ] $ filter: [ ... ] $ reduce: [ ... ] acc: 0

The syntax was borrowed from Haskell, which defines a $ operator for pretty much the same purpose (albeit with the opposite associativity).

I prefer this to wrapping everything in parentheses, but it still feels like an ugly hack, and it’s one part of the language which I’m not hugely satisfied with.

Unfortunately, I think every possible solution is a trade-off, weakening a different aspect of the language’s syntax. You could wrap related arguments in parentheses, but that makes calls noisier:

array (map: [ ... ]) (filter: [ ... ]) (reduce: [ ... ] acc: 0)

Or introduce some kind of consistently-required “end of argument list” operator, but that makes chains of unary calls look worse:

// Better?
array map: [ ... ]; filter: [ ... ]; reduce: [ ... ] acc: 0;

// Worse!
number = Console; input; strip; toInteger.

Implementation is hard

Stepping back from the language itself for a minute, let’s talk about my implementation. It’s both slow and memory inefficient!

Compiling the block-driven control flow into efficient bytecode turned out to be really complex, and the current implementation of the interpreter uses a sluggish stack-based virtual machine. There’s a lot of runtime overhead for dynamic method lookup and routing captured local variables around blocks.

For your interest, the bytecode for the factorial implementation at the beginning of this post looks like this:

get (self)
push 1
call lessThanOrEquals:
push block [ |  |
    push 1
]
push block [ |  |
    get (self)
    get (self)
    push 1
    call sub:
    call factorial
    call mul:
]
call ifTrue:else:

There is also the challenge of managing the call stack - even in simple Babble programs, the call stack quickly explodes in size because of all the block invocations used to handle control-flow.

In a language with traditional control-flow constructs, an if statement doesn’t touch the call stack at all. But in Babble, it adds at least two frames to the call stack - one calling ifTrue:else, and then another calling whichever block was executed based on the condition.

This was frequently a problem when trying to write recursive algorithms for Advent of Code!

I’ve no doubt there are clever optimisation techniques which could be used to make a Babble interpreter manageable, but my language development skills aren’t quite there yet.

Learn more

I hope you’ve enjoyed reading about Babble as much as I enjoyed developing it!

The process of imagining a language in my head, to actually being able to write programs in it, was an immensely rewarding one.

My gut feeling is that the Babble project is done now, and my next language will likely be something entirely new. I’d like to try writing a usable compiled language next, which I’m just starting to do for my Delta Null project.

If you’d like to learn more about Babble, then check out:

The GitHub repository
The language tour, a “learn X in Y minutes” style introduction
The API documentation

Aaron Christiansen