Latest blog entries

Sorting by popularity like Reddit with Ruby, PostgreSQL and Elastic, part 1

First part of a post showing how we can sort items by popularity in a Ruby application following the Reddit style, using PostgreSQL and Elastic.

Mar 25 2015 : Flavio Granero

TDD, Coverage and Why Testing Exceptions

Why should I test the exceptions in my code?

Feb 24 2015 : Brujo Benavides

Galgo iOS Library

An iOS library to display logs as on screen overlays.

Feb 10 2015 : Andres Canal

Announcing xref_runner

Making full use of the Erlang toolbox

Feb 10 2015 : Iñaki Garay

Weird List Comprensions in Erlang

Some strange cases regarding List Comprenhensions

Jan 13 2015 : Brujo Benavides

How to automatically generate different production and staging builds

Use gradle to build different flavors of APKs

Dec 22 2014 : Henrique Boregio

Galgo Android Library

An android library to display logs as on screen overlays

Nov 20 2014 : Henrique Boregio

ErloungeBA @ Inaka

ErloungeBA meeting @ Inaka Offices.

Nov 14 2014 : Inaka

Shotgun: HTTP client for Server-sent Events

Show usage of Shotgun and how consuming SSE can be simple.

Oct 20 2014 : Juan Facorro

Metaprogramming in Erlang: Writing a partial application function

The joys of metaprogramming and Erlang's abstract format

Oct 14 2014 : Hernán Rivas Acosta

Implementing an Android REST Client using Retrofit

Quickly create REST Clients from a simple java interface

Oct 10 2014 : Henrique Boregio

Worker Pool (for Erlang)

Introducing one of our open-source tools: Worker Pool

Sep 25 2014 : Brujo Benavides

The Fork Workflow in iOS

A clear way to apply modifications to your project dependencies

Sep 19 2014 : Pablo Villar

Launching Android Activities in a Separate Task

Launching Android Activities in a Separate Task

Sep 09 2014 : Henrique Boregio

Getting the right colors in your iOS app

How to keep consistence when picking and applying colors

Sep 05 2014 : Pablo Villar

The King of Code Style

Introducing our erlang style guide and style-checking tool, Elvis

Sep 05 2014 : Iñaki Garay

Proud to announce our new home with Erlang Solutions

Inaka is proud to announce our new home with Erlang Solutions!

Aug 05 2014 : Chad DePue


IKJayma: A simple iOS Networking Library

Jul 21 2014 : Tom Ryan

Become an Erlang Cowboy and tame the Wild Wild Web - Part I

Erlang: From zero to coding a commenting system

Jun 23 2014 : Federico Carrone

Implementing a simple Rest Client in Android

How to create a simple Rest Client in Android

May 19 2014 : Henrique Boregio

Assisted Workflow: a CLI tool to speed up simple git workflows

Introducing the assisted_workflow gem, a cli tool with useful commands to integrate a simple git workflow with the story tracker and github pull requests

Mar 25 2014 : Flavio Granero

Cleaning Up Your GitHub Tree

How to clear all those stray branches

Feb 21 2014 : Pablo Villar

Friday Talks at Inaka

Lunch together, talk together

Dec 20 2013 : Inaka Blog

RubyConf Argentina 2013

Inaka represents at RubyConf 2013

Dec 18 2013 : Inaka Blog

Bounce Rate Bare-Bones Basics

An overview of bounce rate in broad strokes

Dec 05 2013 : Inaka Blog

Paintball: Inaka’s End-of-the-Year Party

Welts and bruises bring Inaka together

Nov 22 2013 : Inaka Blog

Inaka Product Review: Connection Minder

Making networking personal again

Nov 19 2013 : Inaka Blog

Canillita - Your First Erlang Server

Learn Erlang by example creating a simple RESTful server

Nov 06 2013 : Fernando "Brujo" Benavides

Landing Page Basics: What, How, and Why

The importance of a well-built landing page

Oct 29 2013 : Inaka Blog

Navigating Open Source Licensing

A comparison of common open source licenses

Oct 17 2013 : Inaka Blog

Git: Not Just for Devs

Sharing the Git love

Oct 07 2013 : Inaka Blog

Inaka Product Review: Go Dish

Go Dish brings good deals on good food

Oct 01 2013 : Inaka Blog

Reconsidering the Big Launch

Why big launches often disappoint, and what to do instead

Sep 23 2013 : Inaka Blog

Inaka Product Review: Whisper

Share secrets and meet new people with Whisper

Sep 16 2013 : Inaka Blog

Inaka Product Review: Ombu

Ombu combines the best of Bump and Scan to make sharing easy

Sep 11 2013 : Inaka Blog

Digitized Halloween Costumes

Morphsuits and Digital Dudz at your fingertips

Sep 09 2013 : Inaka Blog

7 Tactics to Build an App Without a Technical Cofounder: Part 3

Focusing on user experience through design

Sep 06 2013 : Inaka Blog

From Erlang to Java and Back Again: Part 1

My experience creating a Java/Erlang OTP application

Sep 05 2013 : Fernando "Brujo" Benavides

7 Tactics to Build an App Without a Technical Cofounder: Part 2

A realistic look at costs and business relations

Aug 30 2013 : Inaka Blog

7 Tactics to Build an App Without a Technical Cofounder: Part 1

Understanding the tools and processes of app development

Aug 28 2013 : Inaka Blog

Second-Screen App Round-Up

A round-up of network agnostic, network-based, and show-based apps

Aug 27 2013 : Inaka Blog

iOS Auto Layout

A review of Apple's Auto Layout technology

Aug 23 2013 : German Azcona

Core Data One-Way Relationships

Common design patterns in iOS applications

Mar 06 2013 : Tom Ryan

Everyday Erlang: Quick and effective caching using ETS

Using ETS for effective caching in Erlang

Mar 05 2013 : Marcelo Gornstein

Don't Under-Think It: SQL vs NoSQL

The effect of database choice on 'technical debt

Feb 26 2013 : Chad DePue

Erlang Event-Driven Applications

A thorough how-to on using events

Jan 21 2013 : Marcelo Gornstein

Don't Under-Think It: Making Critical Decisions When Building an iOS Application

How a few up-front decisions can make or break an app

Dec 06 2012 : Chad DePue

Some Erlang Magic for Beginners

Erlang tricks for beginners

Dec 03 2012 : Fernando "Brujo" Benavides

Inaka:Pong - DIY Sport

How to play Inaka:Pong, a new sport

Dec 03 2012 : Fernando "Brujo" Benavides

Every-day Erlang: Handling Crashes in Erlang

Handling crashes when calling gen_server:start link outside a supervisor

Nov 29 2012 : Marcelo Gornstein

Inaka Friday lunches

Team building at Inaka

Nov 02 2012 : Chad DePue

Inaka is a proud sponsor of Erlang DC

The largest Erlang event on the East Coast

Oct 23 2012 : Jenny Taylor

Inaka proud to be a sponsor of RubyConf Argentina

The largest Ruby event in South America

Oct 23 2012 : Jenny Taylor

Inaka client featured on LifeHacker

Big press for the Heroku-powered Rails-based Gmail plugin

Feb 28 2012 : Chad DePue

Scaling Erlang

Scale testing a sample Erlang/OTP application

Oct 07 2011 : Fernando "Brujo" Benavides

Memory Management Changes in iOS 5

A review of Apple's new ARC technology

Sep 05 2011 : German Azcona

My Year of Riak

Thoughts on using Basho's Riak database in production.

Aug 25 2011 : Chad DePue

My Year of Riak

Chad DePue, Aug 25 2011

Startups often ask my opinion on databases for their new application. In the past year we've launched a big CouchDB-based application, and we've helped build, a Riak-based facebook app (but not yet launched). We've also helped launch, an Amazon SimpleDB based application. Each of these has been an opportunity to further develop my philosophy about what makes a good database for a website or mobile app, and when to choose each one. I have a separate series coming on database tradeoffs but since there isn't that much information on Riak out in the wild yet, I will start by profiling my thoughts on the database. These are my opinions and thoughts after a bit more than a year of using it at Inaka.


Riak is a Key/Value store database where an afternoon reading the behind-the-scenes architecture of Amazon S3 is a helpful primer on how the database works. Basho's website (Basho is the company that built Riak) is remarkably obtuse about explaining what's so great about it. However, if you've ever had to deal with the hassles of scaling MySQL or other data stores across multiple servers, you'll want to familiarize yourself with Riak. Those who know they need Riak have suffered the pain of scaling other databases, and so they are a self-selecting group. Riak hasn't exactly gone mainstream yet, but it's the database all the cool kids are talking about so it's good to know at least what makes it special.

Storing data

  • Data is stored in buckets of keys, just like Amazon S3
  • Keys hold values and values can be any type of data.
  • Keys have a content-type which is set when the data is PUT or POSTed into Riak.
  • JSON is the most common type of data stored as it's easy to query with Javascript, but there's nothing magical about JSON with Riak - it's all data!

Servers crash, hard drives fail

Servers will fail; recovering from failure should be automatic; data should not be lost during a failure. Riak does this by making copies of the data across different nodes. The process of making replicas is automatic when items are stored in Riak. When a node goes down, the cluster of nodes detects and rebalances the data in the cluster across the remaining nodes. The brilliance of Riak is all the hassle of recovering from failure and of adding new nodes as you need more storage, is absolutely painless. It Just Works. There are tools for adding and removing nodes and they couldn't be simpler to use.

The brilliance of Riak is all the hassle of recovering from failure and of adding new nodes as you grow is absolutely painless. It Just Works.

A weed-eater vs a V-8

The minimum recommended configuration is three nodes, and you can add as many as you would like. I've heard of clusters up to 60 and I'm sure at this point there are more. The idea of having one Riak "node" is possible - but it's like running a 1-cylinder four-stroke engine, which would run rough, if at all. The whole design of the cylinder and its valves is to work in concert with 3 or 5 or 7 others, each one at a different point in the power cycle, all together bringing the syncopated low rhythmic rumbling sound that indicates the power underneath the hood. Riak is like a V-8, it's really designed to run as a cluster of nodes. Riak throughput goes up with additional nodes, and we have anecdotal evidence that response time is faster to a point as you add nodes as well.

Accessing data and client libraries

Riak provides two protocols for accessing data - HTTP and Protocol Buffers - a Google-created format for structuring data which is more efficient than HTTP.

Because each node communicates with all of the others, you can ask any one node for data. It's trivial to place a HTTP load balancer in front of the nodes to automate the hassle of making round robin requests from your clients - some clients don't yet support round-robin requests to multiple nodes - but it adds (depending on your load balancer) a potential single point of failure.

Clients in Ruby, Python, Javascript, Erlang, (and others) are available. If you store your data as JSON-encoded documents, it's easy to interact with Riak from different languages.

Querying data

When you want to get data out, you need to query for it. If you know the key, it's easy - just make an HTTP request for the data and you're done. But what about queries - aggregate data or a selection of nodes? There is currently only one way to get data out of Riak and that's to use Map/Reduce.

The easiest way to explain Map/Reduce is to say that it's like writing a simple piece of code to query a database, then running that query on all the rows of data, on all the servers where that data lives, and then collating the results. Think of Google's giant search index. It wouldn't be possible to build that index by bringing all the data of the billions of web pages to the server that builds it - the actual work of building the index must be completed close to where the data actually exists.

The most classic example of "Map/Reduce" is all over the web, including Basho's own demo - is the word count. See here (CouchDB), and Ilya Grigorik has a good one here.

So what do people actually use Riak for?

See Who is Using Riak for specific companies, but here's my short list:

  • For storing web session data that could grow indefinitely. Shopping carts that are always available.
  • Storing log data that could grow very large.
  • Write-heavy projects.
  • Documents where the schema between documents could be different.
  • When you absolutely can't have a database with a single point of failure.
  • Example: streaming video data to disk for later processing.
  • Example: storing streams of sensor data.

When do I want to use something other than Riak?

  • If you will be performing SQL-style set operations or your data is relational.
  • If you have budget constraints, because of disk storage requirements, the amount of data stored to ensure redundancy across the nodes will be high.
  • If you don't like running your own servers, as there are not any hosted-Riak services that I would recommend (currently).
  • If absolute latency for response times of individual requests is a priority.
  • If you need to guarantee any read of a key will see the same value immediately, as nodes can take a while to guarantee writes. There are no transactions in Riak.

If I use Riak, I need to be comfortable with...

  • Complex Queries in Javascript that are significantly more difficult to write and debug than a SQL equivalent.
  • A potential tradeoff -- possible increased development complexity for massively decreased deployment complexity.
  • No ability to list keys and therefore no equivalent to "select * from customers". (You can request keys from a bucket but it's - currently an expensive operation that can block all other activies on the nodes; meaning, don't do it.)

How do I deal with things that need to be atomic like queues and counters?

For every application Inaka has built to-date, we use Riak with Redis. For caching data, counters, quick set operations, and anything we would use Memcache for, we use Redis. For all the actual permanent data storage, we use Riak. This often creates a single point of failure at the database level, but we're almost always dealing with other single points of failure, and you can use read-slaves with Redis to eliminate this to some extent. Particularly if you're not using Redis for permanent storage, you can go a long way with two Redis servers and a Riak cluster. They're a great compliment to each other.

Roadmap Features

Basho has demonstrated secondary indices which allow for querying across the database without having to write a Map/Reduce. I believe this will be a significant improvement to the product, though I'm not super convinced they have the right format for the query language yet - it feels a bit clumsy with type definitions in the HTTP query syntax.

Additionally, Basho has promised, down the road, a SQL-like syntax which would make interacting with the database much more powerful for the average developer. The roadmap looks bright and Basho is very responsive to community feature requests.

I talked with Mark Phillips with Basho, and he gave me the short list of 1.0 features coming this fall:

  • Secondary Indexing - as referenced above.
  • Lager - More traditional, unix-friendly logging
  • LevelDB Integration - A google-created backend that allows for different performance characteristics than the default backend, called BitCask. One thing I intentionally didn't discuss is the pluggable datastores in Riak, as it's not that important for understanding the basics of the database, but since a new one is a roadmap item I'll just say that it's a great feature - you can use the default or the MySQL backend, or any of a number of stores, even Redis. LevelDB seems to have some important characteristics such as built-in compression, instant snapshotting, and more.
  • Riak Pipe - Ability to setup phases of map/reduce jobs in a 'pipeline'. It's in beta and I've played around with it. Easiest way to explain it is that the Basho guys are thinking through how to make the complexity of Map/Reduce easier and more powerful and this is the first step.
  • Search Integration - Search was a separate install with a Java dependency. That has been removed and search is a 'first class feature' of Riak now.

Final Thoughts

  • I only recommend Riak to clients that really understand their needs and can confidently walk through the list above. I have mixed feelings about recommending Riak today, because most people don't have the context to make the right decision. It's easy to pick Riak or another NoSQL store because you're worried about scale. But, it's way, way more important to worry about those first 100 customers than it is to worry about your first million. Riak is catnip to sufferers of "Premature Database Optimazation Syndrome" because it works - it does actually allow near linear scale, but that's not usually the problem.
  • Map/Reduce databases are a brilliantly powerful way of turning database queries on their head, but currently Riak's can be overloaded with complex javascript. With a big Map/Reduce query with a lot of keys, you either need to get the keys into the engine, which means POSTing them (and thereby sending them over the wire) or performing some sort of "bucket filter" which will cause a "list-keys" operation (see above) and will not scale well in production. This has the effect of narrowing the window of acceptable Map/Reduce scenarios, and forcing painful workarounds such as batching operations outside of Riak using job systems like Resque.
  • Hosting can be expensive. We've seen poor performance running Amazon small nodes, so we generally recommend running on AWS large boxes. Running three m1.large AWS boxes will cost around $730 USD/month, which means starting out with Riak is a more expensive proposition than with some other databases in a cloud environment, and it is often worth considering dedicated hosting.
  • There are some commercial features available as well that I didn't mention. Probably the most important is site-to-site replication. This is not available in the open source version but I'm assuming is the major draw for the big enterprise paying customers so far.
  • Net/net: I REALLY like Riak when I'm lying in bed at night not worried about a few node failures.
  • Overall, if you know you need Riak, it's a joy to use; easy to scale, and it's a powerful tool in your database arsenal.