Aaron Christiansen

Elixir's bitstrings - the data type I didn't know I wanted

2021-12-17

After hearing a lot about it within the Ruby community, I’m working through this year’s Advent of Code in Elixir.

Overall, I’m enjoying Elixir so far! I’ve found that Ruby knowledge has helped me to get to grips with Elixir rather quickly - their syntax is very similar, and I’m able to make educated guesses on the names of Elixir functions coming from Ruby too. By far the most challenging part of learning Elixir has been solving the Advent of Code problems in a functional, immutable style.

But in this post, I want to focus on one specific part of Elixir: bitstrings. These were invaluable for solving Day 16 of Advent of Code, which involved parsing packets made up of tightly-packed bit-level structures.

Thinking About Strings

Let’s step back from bitstrings and think about regular old strings of text. Here’s one:

"hello"

In memory, a string will actually be a sequence of bytes (with a length attached somehow, not shown here):

" h    e    l    l    o    "
  0x68 0x65 0x6c 0x6c 0x6f

Many interpreted languages, such as Python, have separate str and bytes types for differentiating between text and an arbitrary list of bytes.

But, Elixir makes no such distinction! Strings are simply an instance of a type called a binary, which is a list of bytes with a length attached. You can create a binary using << >> syntax - notice that Elixir’s prints both of these inputs in the same way (using IO.inspect, Elixir’s equivalent of Ruby’s p or Python’s repr):

"hello"
# => "hello"

<< 0x68, 0x65, 0x6c, 0x6c, 0x6f >>
# => "hello"

That’s because they are the same underlying data type, and the double-quotes are really just a shortcut for creating a binary from some characters.

Elixir is printing it back to us as a string for our convenience, because it can see that the binary is made up of printable Unicode characters. If we create a binary of unprintable characters instead, it’ll display it as a binary rather than a string:

<< 1, 2, 3 >>
# => << 1, 2, 3 >>

OK, so we’ve got a type which can store some bytes. But how’s this special at all, when you can simply create an array or vector of bytes in most languages?

Introducing Bitstrings

In Elixir, binaries are not really their own data type. A binary is actually a special case of the real data type, called a bitstring. A bitstring is a binary if the number of bits is divisible by 8.

This is the special part! Most languages I’ve worked with don’t provide facilities for seamlessly working with data whose size isn’t a whole number of bytes, but in Elixir we can use bitstrings to work with any number of bits, not necessarily divisible by 8 into bytes.

Bitstrings are constructed using the same << >> syntax, but we can tag each component with its size in bits. For example:

<< 1 :: size(2), 6 :: size(4) >>
# => << 19::size(6) >>

The size can be omitted as a shorthand, allowing us to just enter << 1 :: 2, 6 :: 4 >>.

Elixir does concatenate these two components together into a single 6-bit field, but that doesn’t matter, since the bit-level representation in memory remains the same:

    Original:   1       3
               .-.   .-----.
               0 1   0 0 1 1
               '-----------'
Concatenated:       19

Working With Bitstrings

We can use bitstrings to easily construct data structures at the bit level, from individual components. But their real power comes from when we want to do the opposite, and deconstruct data into its bits.

This is the best part: bitstrings support Elixir’s pattern matching! Let’s try pulling apart that example from earlier using pattern matching:

<< x :: 2, y :: 4 >> = << 19 :: 6 >>
x # => 1
y # => 3

Just like that, we can extract arbitrarily-sized integers from bitstrings!

The equivalent using most programming language’s bitwise operators would be something like:

x = (19 & 0b110000) >> 4
y = 19 & 0b11

In my opinion, Elixir’s pattern matching approach shows intention much better and is less error-prone. (Elixir does have bitwise operators those, so you can fall back to them if you like!)

Pattern matching allows us to do even more advanced things. Instead of :: size(x) or :: x, we can use the :: bitstring specifier to match a chunk of any size. For example, given some input data of any unknown size, we could split apart the first two bits and the remaining bits:

<< first :: 2, rest :: bitstring >> = << 243, 48 >>
first # => 3
rest # => << 204, 48::size(6) >>

(This particular application was where they were most useful in Advent of Code!)

For a final example, let’s combine this with Elixir’s pattern matching on method definitions, to define a method which counts the number of 1 bits in a bitstring:

# If the next bit is 1, count it and recurse
def count_bits(<< 1 :: 1, rest :: bitstring >>), do: 1 + count_bits(rest)

# If the next bit is 0, don't count it and recurse
def count_bits(<< 0 :: 1, rest :: bitstring >>), do: count_bits(rest)

# Base case: if bits have run out, stop
def count_bits(<< >>), do: 0

# Let's try it out...
count_bits(<< 55 >>)
# => 5
# That's right! 55 is 0b00110111

Conclusion

I hope you’ve found this to be an interesting look at bitstrings, and that you can see their value for certain problems! Although libraries with similar functionality are available for other languages, I doubt any of them will manage to achieve the same level of language integration with pattern matching that Elixir accomplishes.

Even if you aren’t too familiar with Elixir, you may find it interesting to have a skim through my Advent of Code Day 16 solution to find how I’ve used bitstrings to deconstruct the puzzle input.

Thanks for reading!