Too big to compile
What's the maximum size of a source file in Erlang?
So it's likely that this question is not relevant to your workflow (at least I hope it isn't), but it's still an interesting little question, and besides, how hard can it be to answer it?
Well, pretty hard in fact, especially because the question is way too vague. What do we even mean by biggest file?
Asking the right questions
So why not start with the opposite question, what's the smallest thing the erlang compiler will accept?
What about an empty file?
$ touch a.erl $ erl -compile a.erl a.erl:1: no module definition
Ok, so we need a module declaration, let's add one:
$ echo "-module(a)." > a.erl $ erl -compile a.erl $ ls -ltr a.beam -rw-r--r-- 1 hernan staff 448 Aug 5 11:06 a.beam
So that's it, a module declaration is the smallest valid erlang source file. So what about the largest file that consists of a module declaration and some trailing whitespace. We could just use binary search adding and removing whitespace to find out at what point it will break! Easy!
Ok, it's not easy
Turns out compiling a simple module declaration with lots of whitespace takes a really really long time! How long? Well, let's find out!
(At this point it's time to introduce this dummy code generation tool I quickly put together for this test)
# One megabyte $ ./build -m w -s 1048576 -n w1 $ time erl -compile w1.erl real 0m1.156s user 0m0.686s sys 0m0.555s # 4mb $ ./build -m w -s 4194304 -n w2 $ time erl -compile w2.erl real 0m2.694s user 0m1.595s sys 0m1.430s # Too early for a pattern, let's try 50mb $ ./build -m w -s 52428800 -n w3 $ time erl -compile w3.erl real 0m31.223s user 0m19.140s sys 0m15.548s # Let's try 200mb, now even the generation takes a while! $ ./build -m w -s 209715200 -n w4 $ time erl -compile w4.erl real 2m57.026s user 2m4.062s sys 1m11.894s # Into gb territory $ ./build -m w -s 1073741824 -n w5 $ time erl -compile w5.erl ^C # Yes, my patience ran out at this point real 138m28.150s user 35m56.726s sys 68m10.626s
So anything over a few hundred mb takes a really absurd amount of time (and RAM!) even though we are talking about just whitespace that we could expect the compiler to ignore. And for what it's worth, compiling an equally simple 1 GB C source file with GCC is basically instantaneous (note that this was timed during the compilation of the large .erl file, hence the absurd wallclock time):
$ time gcc test.c -o test real 0m21.832s user 0m1.364s sys 0m2.371s
So the maximum size remains somewhat of a mistery, but since it looks like the heat death of the Universe will come before I can use binary search to find it out, we'll have settle for "whatever the OS is capable of handling" and call it a day.
Some more interesting tests
But what about more useful metrics, like the maximum number of functions?
$ ./build -m f -s 1000 -n f1 $ time erl -compile f1.erl real 0m0.719s user 0m0.544s sys 0m0.201s
Ok, this works just fine, let's keep adding functions, what about going over the maximum number of atoms (as specified in this documnt)?
$ ./build -m f -s 1048577 $ time erl -compile functions.erl Crash dump was written to: erl_crash.dump no more index entries in atom_tab (max=1048576) real 6m9.100s user 0m57.675s sys 4m35.273s
So it doesn't work, which is absolutely expected since erlang is self hosted. Can we actually increase this limit?
$ time erl -compile functions.erl +t 10485770 real 613m25.229s user 583m36.049s sys 10m36.511s
Well yes, we can, but again we are faced with the time constraints. Of course loading this 54mb .beam file will throw the "no more index entries in atom_tab" error, but the file itself works.
And of course feel free to modify the .erl generator to quickly put together tests more relevant to your environment.
What we can actually learn from this
Well, first of all, the Erlang compiler could use an optimization or two and it seems that compilation times grow exponentially with the source file size (this gives us a clue as to why it takes so long, but that's the subject of another blog post), but more importantly, we need to avoid having source files over a few MB.
However, if we absolutely must have big files for some reason (maybe they are generated, maybe we are just evil), I would say that knowing whitespace takes time to compile, means that a generated module should have as little as possible.