Latest blog entries

The Art of Writing a Blogpost

The Art of Writing a Blogpost

Mar 09 2017 : Matias Vera

SpellingCI: No more spelling mistakes in your markdown flies!

Feb 14 2017 : Felipe Ripoll

Fast reverse geocoding with offline-geocoder

Do you need a blazing fast reverse geocoder? Enter offline-geocoder!

Jan 18 2017 : Roberto Romero

Using Jayme to connect to the new MongooseIM REST services

MongooseIM has RESTful services!! Here I show how you can use them in an iOS application.

Dec 13 2016 : Sergio Abraham

20 Questions, or Maybe a Few More

20 Questions, or Maybe a Few More

Nov 16 2016 : Stephanie Goldner

The Power of Meeting People

Because conferences and meetups are not just about the technical stuff.

Nov 01 2016 : Pablo Villar

Finding the right partner for your app build

Sharing some light on how it is to partner with us.

Oct 27 2016 : Inaka

Just Play my Sound

How to easily play a sound in Android

Oct 25 2016 : Giaquinta Emiliano

Opening our Guidelines to the World

We're publishing our work guidelines for the world to see.

Oct 13 2016 : Brujo Benavides

Using NIFs: the easy way

Using niffy to simplify working with NIFs on Erlang

Oct 05 2016 : Hernan Rivas Acosta

Function Naming In Swift 3

How to write clear function signatures, yet expressive, while following Swift 3 API design guidelines.

Sep 16 2016 : Pablo Villar

Jenkins automated tests for Rails

How to automatically trigger rails tests with a Jenkins job

Sep 14 2016 : Demian Sciessere

Erlang REST Server Stack

A description of our usual stack for building REST servers in Erlang

Sep 06 2016 : Brujo Benavides

Replacing JSON when talking to Erlang

Using Erlang's External Term Format

Aug 17 2016 : Hernan Rivas Acosta

Gadget + Lewis = Android Lint CI

Integrating our Android linter with Github's pull requests

Aug 04 2016 : Fernando Ramirez and Euen Lopez

Passwordless login with phoenix

Introducing how to implement passwordless login with phoenix framework

Jul 27 2016 : Thiago Borges

Beam Olympics

Our newest game to test your Beam Skills

Jul 14 2016 : Brujo Benavides


Three Open Source Projects, one App

Jun 28 2016 : Andrés Gerace


Running credo checks for elixir code on your github pull requests

Jun 16 2016 : Alejandro Mataloni

Thoughts on rebar3

Thoughts on rebar3

Jun 08 2016 : Hernán Rivas Acosta

See all Inaka's blog posts >>

Too big to compile

Hernán Rivas Acosta wrote this on August 05, 2015 under erlang .

What's the maximum size of a source file in Erlang?

So it's likely that this question is not relevant to your workflow (at least I hope it isn't), but it's still an interesting little question, and besides, how hard can it be to answer it?

Well, pretty hard in fact, especially because the question is way too vague. What do we even mean by biggest file?

Asking the right questions

So why not start with the opposite question, what's the smallest thing the erlang compiler will accept?

What about an empty file?

$ touch a.erl
$ erl -compile a.erl 
a.erl:1: no module definition

Ok, so we need a module declaration, let's add one:

$ echo "-module(a)." > a.erl
$ erl -compile a.erl
$ ls -ltr a.beam
-rw-r--r--  1 hernan  staff  448 Aug  5 11:06 a.beam

So that's it, a module declaration is the smallest valid erlang source file. So what about the largest file that consists of a module declaration and some trailing whitespace. We could just use binary search adding and removing whitespace to find out at what point it will break! Easy!

Ok, it's not easy

Turns out compiling a simple module declaration with lots of whitespace takes a really really long time! How long? Well, let's find out!

(At this point it's time to introduce this dummy code generation tool I quickly put together for this test)

# One megabyte
$ ./build -m w -s 1048576 -n w1
$ time erl -compile w1.erl 
real    0m1.156s
user    0m0.686s
sys     0m0.555s
# 4mb
$ ./build -m w -s 4194304 -n w2
$ time erl -compile w2.erl 
real    0m2.694s
user    0m1.595s
sys     0m1.430s
# Too early for a pattern, let's try 50mb
$ ./build -m w -s 52428800 -n w3
$ time erl -compile w3.erl 
real    0m31.223s
user    0m19.140s
sys 0m15.548s
# Let's try 200mb, now even the generation takes a while!
$ ./build -m w -s 209715200 -n w4
$ time erl -compile w4.erl
real    2m57.026s
user    2m4.062s
sys 1m11.894s
# Into gb territory
$ ./build -m w -s 1073741824 -n w5
$ time erl -compile w5.erl 
^C # Yes, my patience ran out at this point
real    138m28.150s
user    35m56.726s
sys     68m10.626s

So anything over a few hundred mb takes a really absurd amount of time (and RAM!) even though we are talking about just whitespace that we could expect the compiler to ignore. And for what it's worth, compiling an equally simple 1 GB C source file with GCC is basically instantaneous (note that this was timed during the compilation of the large .erl file, hence the absurd wallclock time):

$ time gcc test.c -o test
real    0m21.832s
user    0m1.364s
sys     0m2.371s

So the maximum size remains somewhat of a mistery, but since it looks like the heat death of the Universe will come before I can use binary search to find it out, we'll have settle for "whatever the OS is capable of handling" and call it a day.

Some more interesting tests

But what about more useful metrics, like the maximum number of functions?

$ ./build -m f -s 1000 -n f1
$ time erl -compile f1.erl
real    0m0.719s
user    0m0.544s
sys     0m0.201s

Ok, this works just fine, let's keep adding functions, what about going over the maximum number of atoms (as specified in this documnt)?

$ ./build -m f -s 1048577
$ time erl -compile functions.erl

Crash dump was written to: erl_crash.dump
no more index entries in atom_tab (max=1048576)

real    6m9.100s
user    0m57.675s
sys 4m35.273s

So it doesn't work, which is absolutely expected since erlang is self hosted. Can we actually increase this limit?

$ time erl -compile functions.erl +t 10485770
real    613m25.229s
user    583m36.049s
sys 10m36.511s

Well yes, we can, but again we are faced with the time constraints. Of course loading this 54mb .beam file will throw the "no more index entries in atom_tab" error, but the file itself works.

And of course feel free to modify the .erl generator to quickly put together tests more relevant to your environment.

What we can actually learn from this

Well, first of all, the Erlang compiler could use an optimization or two and it seems that compilation times grow exponentially with the source file size (this gives us a clue as to why it takes so long, but that's the subject of another blog post), but more importantly, we need to avoid having source files over a few MB.

However, if we absolutely must have big files for some reason (maybe they are generated, maybe we are just evil), I would say that knowing whitespace takes time to compile, means that a generated module should have as little as possible.