Erlang REST Server Stack
My colleagues at Erlang Solutions are all fired up with a very interesting project which I’m not allowed to write about here. Nevertheless, an important aspect of it involves recognising and generalising some architectural pieces shared by the vastly different systems that we build. Since I’m Inaka’s Tech Lead, my main contribution to the project is to describe the usual stack and point out the characteristics of the systems built here. It is a really nice exercise and I think the results are worth sharing. Here we go…
So… what are the systems we build at Inaka?
The most common scenario here is for us to build end-to-end solutions for mobile or web apps. From very far away, most of our systems look like this:
- Some Clients, one for each desired technology.
- A Server, encapsulating most of the business logic and staying between the clients and the Database(s).
- An Admin Panel, a simple website for system owners to manage their system.
Server and Admin live in close vicinity, so they communicate by sharing a Database. Sometimes that’s not enough, and in those cases we open up an API on the server that the Admin can hit when needed.
The communication between the Server and the different Clients is much more interesting. For this part we use REST APIs. Well… maybe not exactly. We might not meet all the requirements for our APIs to be considered truly RESTful. We do respect the HTTP protocol and we use it to expose entities from the server that the Clients consume in a mostly CRUD fashion.
Sometimes, though, we can’t relay on the client to always start the conversation. We want to be able to send information from the server to the client on server demand. To implement that, we use SSE.
You might have noticed that I haven’t named any programming language whatsoever in the paragraphs above. That’s because we don’t always use the same languages for the same components on our systems:
- We build our Android Clients in Java, but we’re already starting to use Kotlin and we might end up using it consistently in the future.
- We used to build iOS Clients in Objective-C; last year we switched to Swift.
- For our web clients, sometimes we use Angular.js, sometimes React.js, and sometimes just plain old jQuery.
- We build some servers in Ruby, others in Erlang, and others in Elixir. It depends on the project requirements.
- We use PostgreSQL, but also Riak, and sometimes just Elastic. More often than not, we use a combination of those.
- Most of our Admin Panels are built in Ruby, but we’ve built some using Erlang+Angular.js too.
As you might have guessed by its name, Erlang Solutions is mostly interested in our Erlang Servers. So, let’s dissect one of them now…
What you see in the diagram above is the collection of libraries that we use to build REST servers in Erlang. I’ll start from the bottom right corner and go all the way up to the cloud on the top left. But before I begin, let me reiterate the goal of this servers: A server like this will share entities with its clients through a REST API and persist them in a database after applying the appropriate business logic. It will also provide a way for clients to receive notifications/events through an SSE endpoint.
So, starting from the bottom right: we have our database there. We might have chosen a SQL DB, a NoSQL DB or a combination of both. Our server will need a way to connect to it and that’s the DB Driver. Libraries like epgsql or tirerl serve that purpose. Luckily for us, we have an abstraction on top of that: SumoDB. SumoDB helps us organise our apps in a proper way without having to think too much on where our data will be stored. Using SumoDB, we model our real-world entities using abstract data types and we use the repository pattern to keep our business logic organised.
The same entities we persist in the database are usually the entities we want to share with the clients; and we want to do it through our API. That requires a web server, and that web server is no other than cowboy. Cowboy provides a really simple way to define REST endpoints: cowboy_rest. To provide a new endpoint (or set of endpoints) through your web server you just need to create a cowboy_rest handler and add it to your server routes.
But then again, we are building an API that our clients should use. We have to communicate the details of it to them in a nice way, right? And one really good way of achieving that goal is to provide documentation using Swagger. Well, to able to do so on top of cowboy, we use cowboy-swagger, and that requires you to also use trails to define your routes and attach metadata to them. With these libraries, we get an interactive documentation for our server APIs that is always up to date.
We could stop right there and we would already have a proper RESTful server that we can use to manage and persist our entities. But we’ve developed many of those, and we know the benefits of extracting common behaviours and encapsulating them on libraries in the style of OTP. That’s why we combined the power of trails’ metadata with sumo’s flexibility to develop SumoREST, a library/framework that let’s us easily create RESTful servers like the ones above by almost exclusively describing our entities and the operations to perform on them.
I stated above that sometimes we need to send events from the server to the clients in a seemingly anti-http fashion. That’s when SSE comes in. Implementing an SSE handler for cowboy is no big deal, but you have to be familiar with the protocol if you want to do it right. Or… you can just use lasse, implement a lasse_handler, and only worry about which events to send to the clients and when.
And about that… if the events are tied to operations that happen in the database (creations, updates, deletions, etc.), the easiest way to trigger them will be to use SumoDB’s gen_event components with a simple gen_event handler to channel them through lasse.
In a nutshell, building a RESTful server the Inaka way requires only the following steps:
- Create a simple Erlang app with no supervision tree
- Add sumo_rest (and optionally lasse) as dependencies
- Add start_phases to your app to:
- create sumo schemas
- subscribe the gen_event handler(s) (if needed for lasse)
- build the required trails and boot up a cowboy server with them
- Identify the system entities and, for each of them:
- create a module that implements sumo_model and sumo_rest_doc behaviours
- create one or two handlers using mixer to mix in the required functions from SumoREST’s sr_entities_handler or sr_single_entity_hand
- appropriately define the trails/0 function in each of them with the metadata required by both SumoREST and swagger
- add the handler names to your list of trails
- Generate a release, boot it up and hit the /api-docs url to check your brand new interactive documentation!
A Note on Client Libraries
Believe it or not, all of the above was just an introduction to this section. This is the actual assignment I’ve got from the other Tech Leads. The idea is to describe the architecture of this kind of systems and highlight the ways in which these systems deal with common issues/concerns that affect all types of Erlang projects. Below you’ll find a list of common concerns and requirements and, for each one, I’ll try to describe the impact they have in the kind of servers described above and the way we usually deal with them.
Authentication and Session Handling
In our systems, we usually use HTTP’s Basic Authentication with session tokens. We keep our user and session tables in our database and we have sumo models and SumoREST endpoints for both (we use POST /users for signup and POST /sessions for login). We generally add an authentication module that implements the is_authorized/2 method from cowboy_rest using functions that we include in our users_repo or sessions_repo modules. Then we mix that function in all of the handlers that need it. We usually require a session token that’s generated on sign-in and sign-up in all endpoints except for those used for sign-in and sign-up, of course. Session tokens are generated when a new session is created and they’re hashed and stored in the database using erlpass.
Storing and Replication of Session Data
Since our endpoints are basically stateless, we don’t actually manage session data (that is: data related to a particular session). Another way to see it is exactly the other way around: every data is session data. Our sessions start (they are created) when the user logs in a new device, and they last until the session token expires. That way, when the users open the app in their devices, they don’t have to login again. The app just keeps using the same session token that it already has.
Point-to-Point / Multi-Point Communication
Most of our apps don’t need to establish connections between two users/devices (point-to-point). Except for SSE, they don’t even keep a connection open with the server. They usually don’t require broadcasting (multi-point) either.
In cases when such a thing is needed, we use proper API endpoints to receive data from clients, SumoDB to persist that data and then trigger the proper events, and a Lasse handler to react to those events and send the data to the recipient(s) using SSE. If the recipient(s) are not connected to the SSE endpoint at the time, we usually send them a push notification (using Amazon SNS) and, next time they connect, our lasse handler will use SumoDB to retrieve the unread events from the DB and send them all together to the client.
Reliability / Single Points of Failure
Since we mostly use libraries and we don’t usually have supervision trees in our main apps, we delegate reliability concerns to the libraries we use:
- SumoDB uses WPool to maintain multiple connections to the database and therefore avoid long message queues and delays if a query is blocked or a connection is lost.
- Cowboy uses an independent process per request. That way, failures in one request do not affect the others. Thanks Erlang! Let it Crash!
To deal with Erlang nodes crashes, we start and monitor them with external tools like monit, and we deploy them in docker containers.
If possible, our databases (which can be considered single points of failure) run in Amazon RDS, a system that we trust in terms of reliability.
Most of our servers are built in a way that allows us to run as many independent nodes as we want (i.e. not using Erlang’s distribution features). Therefore, we usually deal with scalability issues by booting up or tearing down servers. We do that dynamically using Amazon ELB.
The usual bottleneck of our systems is the database. That’s why almost every system that we build contains a cache on top of SumoDB, implemented with epocxy’s cxy_cache. We still have to add this to SumoREST.
Distributed Architectural Patterns
As I stated above: our systems are not usually built around Erlang’s distributed architecture. But sometimes we do need to keep our multiple nodes connected in a proper Erlang fashion. In those scenarios, we usually add a node monitor: a process that uses net_kernel:monitor_nodes/1 to detect nodes going up or down and keeps track of current live nodes using a table (i.e. a SumoDB model and repo) in the database.
Since the messages that we need to send between nodes usually come from SumoDB events, we install special gen_event handlers in each node with the sole responsibility of broadcasting the received events to the gen_event dispatchers in the other nodes.
This is an area that we can and should improve, but it’s by far not the most common scenario around here. The last time we had to build a system that required this was almost 2 years ago.
So, from the paragraphs above, it might seem that we build simple systems. And that might be true: We don’t build complex masterpieces of software. But not many years ago, building a web server (and I’m not even talking about REST here) was not an easy thing (thanks, Loïc!), modelling your system and persisting your entities was not only a single indivisible mess: it was hopelessly tied to the DB you’ve chosen to work with (thank you, Chelo!), documenting your API required you to write markdown and keep it up to date by hand (thanks, Harry & Carlos!), we used to implement the whole SSE protocol ourselves every time we needed it (thank you, Juan!). So, if we build simple systems now that’s only because we made them simple!