Latest blog entries

Erlang and Elixir Factory Lite B.A.

A brief introduction about what was the Erlang Factory Conference in Buenos Aires for some Inaka team members

Jul 07 2017 : Euen Lopez

The Art of Writing a Blogpost

The Art of Writing a Blogpost

Apr 11 2017 : Matias Vera

SpellingCI: No more spelling mistakes in your markdown flies!

Feb 14 2017 : Felipe Ripoll

Fast reverse geocoding with offline-geocoder

Do you need a blazing fast reverse geocoder? Enter offline-geocoder!

Jan 18 2017 : Roberto Romero

Using Jayme to connect to the new MongooseIM REST services

MongooseIM has RESTful services!! Here I show how you can use them in an iOS application.

Dec 13 2016 : Sergio Abraham

20 Questions, or Maybe a Few More

20 Questions, or Maybe a Few More

Nov 16 2016 : Stephanie Goldner

The Power of Meeting People

Because conferences and meetups are not just about the technical stuff.

Nov 01 2016 : Pablo Villar

Finding the right partner for your app build

Sharing some light on how it is to partner with us.

Oct 27 2016 : Inaka

Just Play my Sound

How to easily play a sound in Android

Oct 25 2016 : Giaquinta Emiliano

Opening our Guidelines to the World

We're publishing our work guidelines for the world to see.

Oct 13 2016 : Brujo Benavides

Using NIFs: the easy way

Using niffy to simplify working with NIFs on Erlang

Oct 05 2016 : Hernan Rivas Acosta

Function Naming In Swift 3

How to write clear function signatures, yet expressive, while following Swift 3 API design guidelines.

Sep 16 2016 : Pablo Villar

Jenkins automated tests for Rails

How to automatically trigger rails tests with a Jenkins job

Sep 14 2016 : Demian Sciessere

Erlang REST Server Stack

A description of our usual stack for building REST servers in Erlang

Sep 06 2016 : Brujo Benavides

Replacing JSON when talking to Erlang

Using Erlang's External Term Format

Aug 17 2016 : Hernan Rivas Acosta

Gadget + Lewis = Android Lint CI

Integrating our Android linter with Github's pull requests

Aug 04 2016 : Fernando Ramirez and Euen Lopez

Passwordless login with phoenix

Introducing how to implement passwordless login with phoenix framework

Jul 27 2016 : Thiago Borges

Beam Olympics

Our newest game to test your Beam Skills

Jul 14 2016 : Brujo Benavides


Three Open Source Projects, one App

Jun 28 2016 : Andrés Gerace


Running credo checks for elixir code on your github pull requests

Jun 16 2016 : Alejandro Mataloni

See all Inaka's blog posts >>

My Year of Riak

Chad DePue wrote this on August 25, 2011 under databases, dev, nosql, riak .

Startups often ask my opinion on databases for their new application. In the past year we've launched a big CouchDB-based application, and we've helped build, a Riak-based facebook app (but not yet launched). We've also helped launch, an Amazon SimpleDB based application. Each of these has been an opportunity to further develop my philosophy about what makes a good database for a website or mobile app, and when to choose each one. I have a separate series coming on database tradeoffs but since there isn't that much information on Riak out in the wild yet, I will start by profiling my thoughts on the database. These are my opinions and thoughts after a bit more than a year of using it at Inaka.


Riak is a Key/Value store database where an afternoon reading the behind-the-scenes architecture of Amazon S3 is a helpful primer on how the database works. Basho's website (Basho is the company that built Riak) is remarkably obtuse about explaining what's so great about it. However, if you've ever had to deal with the hassles of scaling MySQL or other data stores across multiple servers, you'll want to familiarize yourself with Riak. Those who know they need Riak have suffered the pain of scaling other databases, and so they are a self-selecting group. Riak hasn't exactly gone mainstream yet, but it's the database all the cool kids are talking about so it's good to know at least what makes it special.

Storing data

  • Data is stored in buckets of keys, just like Amazon S3
  • Keys hold values and values can be any type of data.
  • Keys have a content-type which is set when the data is PUT or POSTed into Riak.
  • JSON is the most common type of data stored as it's easy to query with Javascript, but there's nothing magical about JSON with Riak - it's all data!

Servers crash, hard drives fail

Servers will fail; recovering from failure should be automatic; data should not be lost during a failure. Riak does this by making copies of the data across different nodes. The process of making replicas is automatic when items are stored in Riak. When a node goes down, the cluster of nodes detects and rebalances the data in the cluster across the remaining nodes. The brilliance of Riak is all the hassle of recovering from failure and of adding new nodes as you need more storage, is absolutely painless. It Just Works. There are tools for adding and removing nodes and they couldn't be simpler to use.

The brilliance of Riak is all the hassle of recovering from failure and of adding new nodes as you grow is absolutely painless. It Just Works.

A weed-eater vs a V-8

The minimum recommended configuration is three nodes, and you can add as many as you would like. I've heard of clusters up to 60 and I'm sure at this point there are more. The idea of having one Riak "node" is possible - but it's like running a 1-cylinder four-stroke engine, which would run rough, if at all. The whole design of the cylinder and its valves is to work in concert with 3 or 5 or 7 others, each one at a different point in the power cycle, all together bringing the syncopated low rhythmic rumbling sound that indicates the power underneath the hood. Riak is like a V-8, it's really designed to run as a cluster of nodes. Riak throughput goes up with additional nodes, and we have anecdotal evidence that response time is faster to a point as you add nodes as well.

Accessing data and client libraries

Riak provides two protocols for accessing data - HTTP and Protocol Buffers - a Google-created format for structuring data which is more efficient than HTTP.

Because each node communicates with all of the others, you can ask any one node for data. It's trivial to place a HTTP load balancer in front of the nodes to automate the hassle of making round robin requests from your clients - some clients don't yet support round-robin requests to multiple nodes - but it adds (depending on your load balancer) a potential single point of failure.

Clients in Ruby, Python, Javascript, Erlang, (and others) are available. If you store your data as JSON-encoded documents, it's easy to interact with Riak from different languages.

Querying data

When you want to get data out, you need to query for it. If you know the key, it's easy - just make an HTTP request for the data and you're done. But what about queries - aggregate data or a selection of nodes? There is currently only one way to get data out of Riak and that's to use Map/Reduce.

The easiest way to explain Map/Reduce is to say that it's like writing a simple piece of code to query a database, then running that query on all the rows of data, on all the servers where that data lives, and then collating the results. Think of Google's giant search index. It wouldn't be possible to build that index by bringing all the data of the billions of web pages to the server that builds it - the actual work of building the index must be completed close to where the data actually exists.

The most classic example of "Map/Reduce" is all over the web, including Basho's own demo - is the word count. See here (CouchDB), and Ilya Grigorik has a good one here.

So what do people actually use Riak for?

See Who is Using Riak for specific companies, but here's my short list:

  • For storing web session data that could grow indefinitely. Shopping carts that are always available.
  • Storing log data that could grow very large.
  • Write-heavy projects.
  • Documents where the schema between documents could be different.
  • When you absolutely can't have a database with a single point of failure.
  • Example: streaming video data to disk for later processing.
  • Example: storing streams of sensor data.

When do I want to use something other than Riak?

  • If you will be performing SQL-style set operations or your data is relational.
  • If you have budget constraints, because of disk storage requirements, the amount of data stored to ensure redundancy across the nodes will be high.
  • If you don't like running your own servers, as there are not any hosted-Riak services that I would recommend (currently).
  • If absolute latency for response times of individual requests is a priority.
  • If you need to guarantee any read of a key will see the same value immediately, as nodes can take a while to guarantee writes. There are no transactions in Riak.

If I use Riak, I need to be comfortable with...

  • Complex Queries in Javascript that are significantly more difficult to write and debug than a SQL equivalent.
  • A potential tradeoff -- possible increased development complexity for massively decreased deployment complexity.
  • No ability to list keys and therefore no equivalent to "select * from customers". (You can request keys from a bucket but it's - currently an expensive operation that can block all other activies on the nodes; meaning, don't do it.)

How do I deal with things that need to be atomic like queues and counters?

For every application Inaka has built to-date, we use Riak with Redis. For caching data, counters, quick set operations, and anything we would use Memcache for, we use Redis. For all the actual permanent data storage, we use Riak. This often creates a single point of failure at the database level, but we're almost always dealing with other single points of failure, and you can use read-slaves with Redis to eliminate this to some extent. Particularly if you're not using Redis for permanent storage, you can go a long way with two Redis servers and a Riak cluster. They're a great compliment to each other.

Roadmap Features

Basho has demonstrated secondary indices which allow for querying across the database without having to write a Map/Reduce. I believe this will be a significant improvement to the product, though I'm not super convinced they have the right format for the query language yet - it feels a bit clumsy with type definitions in the HTTP query syntax.

Additionally, Basho has promised, down the road, a SQL-like syntax which would make interacting with the database much more powerful for the average developer. The roadmap looks bright and Basho is very responsive to community feature requests.

I talked with Mark Phillips with Basho, and he gave me the short list of 1.0 features coming this fall:

  • Secondary Indexing - as referenced above.
  • Lager - More traditional, unix-friendly logging
  • LevelDB Integration - A google-created backend that allows for different performance characteristics than the default backend, called BitCask. One thing I intentionally didn't discuss is the pluggable datastores in Riak, as it's not that important for understanding the basics of the database, but since a new one is a roadmap item I'll just say that it's a great feature - you can use the default or the MySQL backend, or any of a number of stores, even Redis. LevelDB seems to have some important characteristics such as built-in compression, instant snapshotting, and more.
  • Riak Pipe - Ability to setup phases of map/reduce jobs in a 'pipeline'. It's in beta and I've played around with it. Easiest way to explain it is that the Basho guys are thinking through how to make the complexity of Map/Reduce easier and more powerful and this is the first step.
  • Search Integration - Search was a separate install with a Java dependency. That has been removed and search is a 'first class feature' of Riak now.

Final Thoughts

  • I only recommend Riak to clients that really understand their needs and can confidently walk through the list above. I have mixed feelings about recommending Riak today, because most people don't have the context to make the right decision. It's easy to pick Riak or another NoSQL store because you're worried about scale. But, it's way, way more important to worry about those first 100 customers than it is to worry about your first million. Riak is catnip to sufferers of "Premature Database Optimazation Syndrome" because it works - it does actually allow near linear scale, but that's not usually the problem.
  • Map/Reduce databases are a brilliantly powerful way of turning database queries on their head, but currently Riak's can be overloaded with complex javascript. With a big Map/Reduce query with a lot of keys, you either need to get the keys into the engine, which means POSTing them (and thereby sending them over the wire) or performing some sort of "bucket filter" which will cause a "list-keys" operation (see above) and will not scale well in production. This has the effect of narrowing the window of acceptable Map/Reduce scenarios, and forcing painful workarounds such as batching operations outside of Riak using job systems like Resque.
  • Hosting can be expensive. We've seen poor performance running Amazon small nodes, so we generally recommend running on AWS large boxes. Running three m1.large AWS boxes will cost around $730 USD/month, which means starting out with Riak is a more expensive proposition than with some other databases in a cloud environment, and it is often worth considering dedicated hosting.
  • There are some commercial features available as well that I didn't mention. Probably the most important is site-to-site replication. This is not available in the open source version but I'm assuming is the major draw for the big enterprise paying customers so far.
  • Net/net: I REALLY like Riak when I'm lying in bed at night not worried about a few node failures.
  • Overall, if you know you need Riak, it's a joy to use; easy to scale, and it's a powerful tool in your database arsenal.