inaka

Latest blog entries

/
The Fork Workflow in iOS

A clear way to apply modifications to your project dependencies

Sep 19 2014 : Pablo Villar

/
Launching Android Activities in a Separate Task

Launching Android Activities in a Separate Task

Sep 09 2014 : Henrique Boregio

/
Getting the right colors in your iOS app

How to keep consistence when picking and applying colors

Sep 05 2014 : Pablo Villar

/
The King of Code Style

Introducing our erlang style guide and style-checking tool, Elvis

Sep 05 2014 : Iñaki Garay

/
Proud to announce our new home with Erlang Solutions

Inaka is proud to announce our new home with Erlang Solutions!

Aug 05 2014 : Chad DePue

/
IKJayma

IKJayma: A simple iOS Networking Library

Jul 21 2014 : Tom Ryan

/
Become an Erlang Cowboy and tame the Wild Wild Web - Part I

Erlang: From zero to coding a commenting system

Jun 23 2014 : Federico Carrone

/
Implementing a simple Rest Client in Android

How to create a simple Rest Client in Android

May 19 2014 : Henrique Boregio

/
Assisted Workflow: a CLI tool to speed up simple git workflows

Introducing the assisted_workflow gem, a cli tool with useful commands to integrate a simple git workflow with the story tracker and github pull requests

Mar 25 2014 : Flavio Granero

/
Cleaning Up Your GitHub Tree

How to clear all those stray branches

Feb 21 2014 : Pablo Villar

/
Friday Talks at Inaka

Lunch together, talk together

Dec 20 2013 : Inaka Blog

/
RubyConf Argentina 2013

Inaka represents at RubyConf 2013

Dec 18 2013 : Inaka Blog

/
Bounce Rate Bare-Bones Basics

An overview of bounce rate in broad strokes

Dec 05 2013 : Inaka Blog

/
Paintball: Inaka’s End-of-the-Year Party

Welts and bruises bring Inaka together

Nov 22 2013 : Inaka Blog

/
Inaka Product Review: Connection Minder

Making networking personal again

Nov 19 2013 : Inaka Blog

/
Canillita - Your First Erlang Server

Learn Erlang by example creating a simple RESTful server

Nov 06 2013 : Fernando "Brujo" Benavides

/
Landing Page Basics: What, How, and Why

The importance of a well-built landing page

Oct 29 2013 : Inaka Blog

/
Navigating Open Source Licensing

A comparison of common open source licenses

Oct 17 2013 : Inaka Blog

/
Git: Not Just for Devs

Sharing the Git love

Oct 07 2013 : Inaka Blog

/
Inaka Product Review: Go Dish

Go Dish brings good deals on good food

Oct 01 2013 : Inaka Blog

/
Reconsidering the Big Launch

Why big launches often disappoint, and what to do instead

Sep 23 2013 : Inaka Blog

/
Inaka Product Review: Whisper

Share secrets and meet new people with Whisper

Sep 16 2013 : Inaka Blog

/
Inaka Product Review: Ombu

Ombu combines the best of Bump and Scan to make sharing easy

Sep 11 2013 : Inaka Blog

/
Digitized Halloween Costumes

Morphsuits and Digital Dudz at your fingertips

Sep 09 2013 : Inaka Blog

/
7 Tactics to Build an App Without a Technical Cofounder: Part 3

Focusing on user experience through design

Sep 06 2013 : Inaka Blog

/
From Erlang to Java and Back Again: Part 1

My experience creating a Java/Erlang OTP application

Sep 05 2013 : Fernando "Brujo" Benavides

/
7 Tactics to Build an App Without a Technical Cofounder: Part 2

A realistic look at costs and business relations

Aug 30 2013 : Inaka Blog

/
7 Tactics to Build an App Without a Technical Cofounder: Part 1

Understanding the tools and processes of app development

Aug 28 2013 : Inaka Blog

/
Second-Screen App Round-Up

A round-up of network agnostic, network-based, and show-based apps

Aug 27 2013 : Inaka Blog

/
iOS Auto Layout

A review of Apple's Auto Layout technology

Aug 23 2013 : German Azcona

/
Core Data One-Way Relationships

Common design patterns in iOS applications

Mar 06 2013 : Tom Ryan

/
Everyday Erlang: Quick and effective caching using ETS

Using ETS for effective caching in Erlang

Mar 05 2013 : Marcelo Gornstein

/
Don't Under-Think It: SQL vs NoSQL

The effect of database choice on 'technical debt

Feb 26 2013 : Chad DePue

/
Erlang Event-Driven Applications

A thorough how-to on using events

Jan 21 2013 : Marcelo Gornstein

/
Don't Under-Think It: Making Critical Decisions When Building an iOS Application

How a few up-front decisions can make or break an app

Dec 06 2012 : Chad DePue

/
Some Erlang Magic for Beginners

Erlang tricks for beginners

Dec 03 2012 : Fernando "Brujo" Benavides

/
Inaka:Pong - DIY Sport

How to play Inaka:Pong, a new sport

Dec 03 2012 : Fernando "Brujo" Benavides

/
Every-day Erlang: Handling Crashes in Erlang

Handling crashes when calling gen_server:start link outside a supervisor

Nov 29 2012 : Marcelo Gornstein

/
Inaka Friday lunches

Team building at Inaka

Nov 02 2012 : Chad DePue

/
Inaka is a proud sponsor of Erlang DC

The largest Erlang event on the East Coast

Oct 23 2012 : Jenny Taylor

/
Inaka proud to be a sponsor of RubyConf Argentina

The largest Ruby event in South America

Oct 23 2012 : Jenny Taylor

/
Inaka client Ming.ly featured on LifeHacker

Big press for the Heroku-powered Rails-based Gmail plugin

Feb 28 2012 : Chad DePue

/
Scaling Erlang

Scale testing a sample Erlang/OTP application

Oct 07 2011 : Fernando "Brujo" Benavides

/
Memory Management Changes in iOS 5

A review of Apple's new ARC technology

Sep 05 2011 : German Azcona

/
My Year of Riak

Thoughts on using Basho's Riak database in production.

Aug 25 2011 : Chad DePue

/
My Year of Riak

Chad DePue, Aug 25 2011

Startups often ask my opinion on databases for their new application. In the past year we've launched a big CouchDB-based application, and we've helped build stylesclub.com, a Riak-based facebook app (but not yet launched). We've also helped launch ming.ly, an Amazon SimpleDB based application. Each of these has been an opportunity to further develop my philosophy about what makes a good database for a website or mobile app, and when to choose each one. I have a separate series coming on database tradeoffs but since there isn't that much information on Riak out in the wild yet, I will start by profiling my thoughts on the database. These are my opinions and thoughts after a bit more than a year of using it at Inaka.

Riak

Riak is a Key/Value store database where an afternoon reading the behind-the-scenes architecture of Amazon S3 is a helpful primer on how the database works. Basho's website (Basho is the company that built Riak) is remarkably obtuse about explaining what's so great about it. However, if you've ever had to deal with the hassles of scaling MySQL or other data stores across multiple servers, you'll want to familiarize yourself with Riak. Those who know they need Riak have suffered the pain of scaling other databases, and so they are a self-selecting group. Riak hasn't exactly gone mainstream yet, but it's the database all the cool kids are talking about so it's good to know at least what makes it special.

Storing data

  • Data is stored in buckets of keys, just like Amazon S3
  • Keys hold values and values can be any type of data.
  • Keys have a content-type which is set when the data is PUT or POSTed into Riak.
  • JSON is the most common type of data stored as it's easy to query with Javascript, but there's nothing magical about JSON with Riak - it's all data!

Servers crash, hard drives fail

Servers will fail; recovering from failure should be automatic; data should not be lost during a failure. Riak does this by making copies of the data across different nodes. The process of making replicas is automatic when items are stored in Riak. When a node goes down, the cluster of nodes detects and rebalances the data in the cluster across the remaining nodes. The brilliance of Riak is all the hassle of recovering from failure and of adding new nodes as you need more storage, is absolutely painless. It Just Works. There are tools for adding and removing nodes and they couldn't be simpler to use.

The brilliance of Riak is all the hassle of recovering from failure and of adding new nodes as you grow is absolutely painless. It Just Works.

A weed-eater vs a V-8

The minimum recommended configuration is three nodes, and you can add as many as you would like. I've heard of clusters up to 60 and I'm sure at this point there are more. The idea of having one Riak "node" is possible - but it's like running a 1-cylinder four-stroke engine, which would run rough, if at all. The whole design of the cylinder and its valves is to work in concert with 3 or 5 or 7 others, each one at a different point in the power cycle, all together bringing the syncopated low rhythmic rumbling sound that indicates the power underneath the hood. Riak is like a V-8, it's really designed to run as a cluster of nodes. Riak throughput goes up with additional nodes, and we have anecdotal evidence that response time is faster to a point as you add nodes as well.

Accessing data and client libraries

Riak provides two protocols for accessing data - HTTP and Protocol Buffers - a Google-created format for structuring data which is more efficient than HTTP.

Because each node communicates with all of the others, you can ask any one node for data. It's trivial to place a HTTP load balancer in front of the nodes to automate the hassle of making round robin requests from your clients - some clients don't yet support round-robin requests to multiple nodes - but it adds (depending on your load balancer) a potential single point of failure.

Clients in Ruby, Python, Javascript, Erlang, (and others) are available. If you store your data as JSON-encoded documents, it's easy to interact with Riak from different languages.

Querying data

When you want to get data out, you need to query for it. If you know the key, it's easy - just make an HTTP request for the data and you're done. But what about queries - aggregate data or a selection of nodes? There is currently only one way to get data out of Riak and that's to use Map/Reduce.

The easiest way to explain Map/Reduce is to say that it's like writing a simple piece of code to query a database, then running that query on all the rows of data, on all the servers where that data lives, and then collating the results. Think of Google's giant search index. It wouldn't be possible to build that index by bringing all the data of the billions of web pages to the server that builds it - the actual work of building the index must be completed close to where the data actually exists.

The most classic example of "Map/Reduce" is all over the web, including Basho's own demo - is the word count. See here (CouchDB), and Ilya Grigorik has a good one here.

So what do people actually use Riak for?

See Who is Using Riak for specific companies, but here's my short list:

  • For storing web session data that could grow indefinitely. Shopping carts that are always available.
  • Storing log data that could grow very large.
  • Write-heavy projects.
  • Documents where the schema between documents could be different.
  • When you absolutely can't have a database with a single point of failure.
  • Example: streaming video data to disk for later processing.
  • Example: storing streams of sensor data.

When do I want to use something other than Riak?

  • If you will be performing SQL-style set operations or your data is relational.
  • If you have budget constraints, because of disk storage requirements, the amount of data stored to ensure redundancy across the nodes will be high.
  • If you don't like running your own servers, as there are not any hosted-Riak services that I would recommend (currently).
  • If absolute latency for response times of individual requests is a priority.
  • If you need to guarantee any read of a key will see the same value immediately, as nodes can take a while to guarantee writes. There are no transactions in Riak.

If I use Riak, I need to be comfortable with...

  • Complex Queries in Javascript that are significantly more difficult to write and debug than a SQL equivalent.
  • A potential tradeoff -- possible increased development complexity for massively decreased deployment complexity.
  • No ability to list keys and therefore no equivalent to "select * from customers". (You can request keys from a bucket but it's - currently an expensive operation that can block all other activies on the nodes; meaning, don't do it.)

How do I deal with things that need to be atomic like queues and counters?

For every application Inaka has built to-date, we use Riak with Redis. For caching data, counters, quick set operations, and anything we would use Memcache for, we use Redis. For all the actual permanent data storage, we use Riak. This often creates a single point of failure at the database level, but we're almost always dealing with other single points of failure, and you can use read-slaves with Redis to eliminate this to some extent. Particularly if you're not using Redis for permanent storage, you can go a long way with two Redis servers and a Riak cluster. They're a great compliment to each other.

Roadmap Features

Basho has demonstrated secondary indices which allow for querying across the database without having to write a Map/Reduce. I believe this will be a significant improvement to the product, though I'm not super convinced they have the right format for the query language yet - it feels a bit clumsy with type definitions in the HTTP query syntax.

Additionally, Basho has promised, down the road, a SQL-like syntax which would make interacting with the database much more powerful for the average developer. The roadmap looks bright and Basho is very responsive to community feature requests.

I talked with Mark Phillips with Basho, and he gave me the short list of 1.0 features coming this fall:

  • Secondary Indexing - as referenced above.
  • Lager - More traditional, unix-friendly logging
  • LevelDB Integration - A google-created backend that allows for different performance characteristics than the default backend, called BitCask. One thing I intentionally didn't discuss is the pluggable datastores in Riak, as it's not that important for understanding the basics of the database, but since a new one is a roadmap item I'll just say that it's a great feature - you can use the default or the MySQL backend, or any of a number of stores, even Redis. LevelDB seems to have some important characteristics such as built-in compression, instant snapshotting, and more.
  • Riak Pipe - Ability to setup phases of map/reduce jobs in a 'pipeline'. It's in beta and I've played around with it. Easiest way to explain it is that the Basho guys are thinking through how to make the complexity of Map/Reduce easier and more powerful and this is the first step.
  • Search Integration - Search was a separate install with a Java dependency. That has been removed and search is a 'first class feature' of Riak now.

Final Thoughts

  • I only recommend Riak to clients that really understand their needs and can confidently walk through the list above. I have mixed feelings about recommending Riak today, because most people don't have the context to make the right decision. It's easy to pick Riak or another NoSQL store because you're worried about scale. But, it's way, way more important to worry about those first 100 customers than it is to worry about your first million. Riak is catnip to sufferers of "Premature Database Optimazation Syndrome" because it works - it does actually allow near linear scale, but that's not usually the problem.
  • Map/Reduce databases are a brilliantly powerful way of turning database queries on their head, but currently Riak's can be overloaded with complex javascript. With a big Map/Reduce query with a lot of keys, you either need to get the keys into the engine, which means POSTing them (and thereby sending them over the wire) or performing some sort of "bucket filter" which will cause a "list-keys" operation (see above) and will not scale well in production. This has the effect of narrowing the window of acceptable Map/Reduce scenarios, and forcing painful workarounds such as batching operations outside of Riak using job systems like Resque.
  • Hosting can be expensive. We've seen poor performance running Amazon small nodes, so we generally recommend running on AWS large boxes. Running three m1.large AWS boxes will cost around $730 USD/month, which means starting out with Riak is a more expensive proposition than with some other databases in a cloud environment, and it is often worth considering dedicated hosting.
  • There are some commercial features available as well that I didn't mention. Probably the most important is site-to-site replication. This is not available in the open source version but I'm assuming is the major draw for the big enterprise paying customers so far.
  • Net/net: I REALLY like Riak when I'm lying in bed at night not worried about a few node failures.
  • Overall, if you know you need Riak, it's a joy to use; easy to scale, and it's a powerful tool in your database arsenal.