Latest blog entries
The Art of Writing a Blogpost
Mar 09 2017 : Matias Vera
Feb 14 2017 : Felipe Ripoll
Do you need a blazing fast reverse geocoder? Enter offline-geocoder!
Jan 18 2017 : Roberto Romero
MongooseIM has RESTful services!! Here I show how you can use them in an iOS application.
Dec 13 2016 : Sergio Abraham
20 Questions, or Maybe a Few More
Nov 16 2016 : Stephanie Goldner
Because conferences and meetups are not just about the technical stuff.
Nov 01 2016 : Pablo Villar
Sharing some light on how it is to partner with us.
Oct 27 2016 : Inaka
How to easily play a sound in Android
Oct 25 2016 : Giaquinta Emiliano
We're publishing our work guidelines for the world to see.
Oct 13 2016 : Brujo Benavides
Using niffy to simplify working with NIFs on Erlang
Oct 05 2016 : Hernan Rivas Acosta
How to write clear function signatures, yet expressive, while following Swift 3 API design guidelines.
Sep 16 2016 : Pablo Villar
How to automatically trigger rails tests with a Jenkins job
Sep 14 2016 : Demian Sciessere
A description of our usual stack for building REST servers in Erlang
Sep 06 2016 : Brujo Benavides
Using Erlang's External Term Format
Aug 17 2016 : Hernan Rivas Acosta
Integrating our Android linter with Github's pull requests
Aug 04 2016 : Fernando Ramirez and Euen Lopez
Introducing how to implement passwordless login with phoenix framework
Jul 27 2016 : Thiago Borges
Our newest game to test your Beam Skills
Jul 14 2016 : Brujo Benavides
Three Open Source Projects, one App
Jun 28 2016 : Andrés Gerace
Running credo checks for elixir code on your github pull requests
Jun 16 2016 : Alejandro Mataloni
Thoughts on rebar3
Jun 08 2016 : Hernán Rivas Acosta
Too big to compile
What's the maximum size of a source file in Erlang?
So it's likely that this question is not relevant to your workflow (at least I hope it isn't), but it's still an interesting little question, and besides, how hard can it be to answer it?
Well, pretty hard in fact, especially because the question is way too vague. What do we even mean by biggest file?
Asking the right questions
So why not start with the opposite question, what's the smallest thing the erlang compiler will accept?
What about an empty file?
$ touch a.erl $ erl -compile a.erl a.erl:1: no module definition
Ok, so we need a module declaration, let's add one:
$ echo "-module(a)." > a.erl $ erl -compile a.erl $ ls -ltr a.beam -rw-r--r-- 1 hernan staff 448 Aug 5 11:06 a.beam
So that's it, a module declaration is the smallest valid erlang source file. So what about the largest file that consists of a module declaration and some trailing whitespace. We could just use binary search adding and removing whitespace to find out at what point it will break! Easy!
Ok, it's not easy
Turns out compiling a simple module declaration with lots of whitespace takes a really really long time! How long? Well, let's find out!
(At this point it's time to introduce this dummy code generation tool I quickly put together for this test)
# One megabyte $ ./build -m w -s 1048576 -n w1 $ time erl -compile w1.erl real 0m1.156s user 0m0.686s sys 0m0.555s # 4mb $ ./build -m w -s 4194304 -n w2 $ time erl -compile w2.erl real 0m2.694s user 0m1.595s sys 0m1.430s # Too early for a pattern, let's try 50mb $ ./build -m w -s 52428800 -n w3 $ time erl -compile w3.erl real 0m31.223s user 0m19.140s sys 0m15.548s # Let's try 200mb, now even the generation takes a while! $ ./build -m w -s 209715200 -n w4 $ time erl -compile w4.erl real 2m57.026s user 2m4.062s sys 1m11.894s # Into gb territory $ ./build -m w -s 1073741824 -n w5 $ time erl -compile w5.erl ^C # Yes, my patience ran out at this point real 138m28.150s user 35m56.726s sys 68m10.626s
So anything over a few hundred mb takes a really absurd amount of time (and RAM!) even though we are talking about just whitespace that we could expect the compiler to ignore. And for what it's worth, compiling an equally simple 1 GB C source file with GCC is basically instantaneous (note that this was timed during the compilation of the large .erl file, hence the absurd wallclock time):
$ time gcc test.c -o test real 0m21.832s user 0m1.364s sys 0m2.371s
So the maximum size remains somewhat of a mistery, but since it looks like the heat death of the Universe will come before I can use binary search to find it out, we'll have settle for "whatever the OS is capable of handling" and call it a day.
Some more interesting tests
But what about more useful metrics, like the maximum number of functions?
$ ./build -m f -s 1000 -n f1 $ time erl -compile f1.erl real 0m0.719s user 0m0.544s sys 0m0.201s
Ok, this works just fine, let's keep adding functions, what about going over the maximum number of atoms (as specified in this documnt)?
$ ./build -m f -s 1048577 $ time erl -compile functions.erl Crash dump was written to: erl_crash.dump no more index entries in atom_tab (max=1048576) real 6m9.100s user 0m57.675s sys 4m35.273s
So it doesn't work, which is absolutely expected since erlang is self hosted. Can we actually increase this limit?
$ time erl -compile functions.erl +t 10485770 real 613m25.229s user 583m36.049s sys 10m36.511s
Well yes, we can, but again we are faced with the time constraints. Of course loading this 54mb .beam file will throw the "no more index entries in atom_tab" error, but the file itself works.
And of course feel free to modify the .erl generator to quickly put together tests more relevant to your environment.
What we can actually learn from this
Well, first of all, the Erlang compiler could use an optimization or two and it seems that compilation times grow exponentially with the source file size (this gives us a clue as to why it takes so long, but that's the subject of another blog post), but more importantly, we need to avoid having source files over a few MB.
However, if we absolutely must have big files for some reason (maybe they are generated, maybe we are just evil), I would say that knowing whitespace takes time to compile, means that a generated module should have as little as possible.