Latest blog entries
Introducing the assisted_workflow gem, a cli tool with useful commands to integrate a simple git workflow with the story tracker and github pull requests
Mar 25 2014 : Flavio Granero
How to clear all those stray branches
Feb 21 2014 : Pablo Villar
Lunch together, talk together
Dec 20 2013 : Inaka Blog
Inaka represents at RubyConf 2013
Dec 18 2013 : Inaka Blog
An overview of bounce rate in broad strokes
Dec 05 2013 : Inaka Blog
Welts and bruises bring Inaka together
Nov 22 2013 : Inaka Blog
Making networking personal again
Nov 19 2013 : Inaka Blog
Learn Erlang by example creating a simple RESTful server
Nov 06 2013 : Fernando "Brujo" Benavides
The importance of a well-built landing page
Oct 29 2013 : Inaka Blog
A comparison of common open source licenses
Oct 17 2013 : Inaka Blog
Sharing the Git love
Oct 07 2013 : Inaka Blog
Go Dish brings good deals on good food
Oct 01 2013 : Inaka Blog
Why big launches often disappoint, and what to do instead
Sep 23 2013 : Inaka Blog
Share secrets and meet new people with Whisper
Sep 16 2013 : Inaka Blog
Ombu combines the best of Bump and Scan to make sharing easy
Sep 11 2013 : Inaka Blog
Morphsuits and Digital Dudz at your fingertips
Sep 09 2013 : Inaka Blog
Focusing on user experience through design
Sep 06 2013 : Inaka Blog
My experience creating a Java/Erlang OTP application
Sep 05 2013 : Fernando "Brujo" Benavides
A realistic look at costs and business relations
Aug 30 2013 : Inaka Blog
Understanding the tools and processes of app development
Aug 28 2013 : Inaka Blog
A round-up of network agnostic, network-based, and show-based apps
Aug 27 2013 : Inaka Blog
A review of Apple's Auto Layout technology
Aug 23 2013 : German Azcona
Common design patterns in iOS applications
Mar 06 2013 : Tom Ryan
Using ETS for effective caching in Erlang
Mar 05 2013 : Marcelo Gornstein
The effect of database choice on 'technical debt
Feb 26 2013 : Chad DePue
A thorough how-to on using events
Jan 21 2013 : Marcelo Gornstein
How a few up-front decisions can make or break an app
Dec 06 2012 : Chad DePue
Erlang tricks for beginners
Dec 03 2012 : Fernando "Brujo" Benavides
How to play Inaka:Pong, a new sport
Dec 03 2012 : Fernando "Brujo" Benavides
Handling crashes when calling gen_server:start link outside a supervisor
Nov 29 2012 : Marcelo Gornstein
Team building at Inaka
Nov 02 2012 : Chad DePue
The largest Erlang event on the East Coast
Oct 23 2012 : Jenny Taylor
The largest Ruby event in South America
Oct 23 2012 : Jenny Taylor
Big press for the Heroku-powered Rails-based Gmail plugin
Feb 28 2012 : Chad DePue
Scale testing a sample Erlang/OTP application
Oct 07 2011 : Fernando "Brujo" Benavides
A review of Apple's new ARC technology
Sep 05 2011 : German Azcona
Thoughts on using Basho's Riak database in production.
Aug 25 2011 : Chad DePue
Don't Under-Think It: SQL vs NoSQL
Technical debt is the cost, in developer time or money, of a line of code due to poor engineering, sloppy programming, or cutting corners. It's everywhere in the technology world. I'm often asked to come in and solve problems due to massive technical debt.
Actual examples, from the obvious to the subtle:
- A marketing website whose creator wasn't a developer and didn't understand that you could include files and therefore copied the same code into every page of the site. The creator was making significant money from the site and did not care that it had no structure as long as he could keep copying and pasting files to update.
- A Ruby on Rails site with no administration pages. When the marketing managers wanted to change parts of the site, they used a database tool to go in and change individual lines in the database.
- An iPhone application that uses a library that is no longer supported by the original developer, that used calls that Apple has subsequently disallowed, meaning that the owner of the app could no longer submit updates until the library was replaced.
- A developer used Ruby on Rails and ActiveRecord and ignored any optimization of the SQL statements it generates. The site now has more than 40 servers, meaning that a small optimization at this point could significantly reduce operations costs.
In each example, an implicit or explicit choice was made. "Should I invest more time now either a) hiring a developer, b) building an admin, c) investigating a better supported library, or d) optimizing my SQL?" In each case, at the time and with the information available, the decision at the time was that it wasn't worth it.
Make the best decision at the time with the information available.
Often developers look at technical debt as something to be avoided. But I picked the four examples above for a very specific reason - they were all VERY profitable startups. I have many more examples of startups that over-engineered their version 1 product and failed. Correlation does not equal causation, but there's something to the idea that shipping a product into customer hands is much more important than engineering a long-term solution to a problem you don't yet have.
How does that relate to NoSQL?
The decision to avoid MySQL, PostgreSQL or other SQL stores is often connected with a fear that the SQL store won't support the traffic or load that the developer believes the app will have. The idea is that if you don't use SQL now, down the road it will be easier to add additional hardware and simply scale up the application. This is because databases like MongoDB and Riak allow sharding, and CouchDB has a simple-to-configure replication system. Typically NoSQL arguments focus on the two most egregious aspects of using a SQL store:
- Addition of columns to an existing database table can be very slow and can cause downtime while it is happening. Even worse, on MySQL, these operations are not "transactional," meaning that you can't easily revert them if they don't go well.
- If your database grows larger than one server can handle you have to start thinking of ways to redesign your app to "shard" the data into multiple databases by hand. This requires a lot of migration and application logic changes - and when you outgrow your first "shard", doing it again is even more painful.
So, there are some real reasons to consider alternatives to SQL, particularly if we believe we will have a MASSIVE amount of data. And each of these databases can be the right choice in certain applications. They each have significant pluses:
- Riak makes adding additional space to your cluster as easy as adding another server. (They call them nodes).
- MongoDB allows for relatively easy (not as easy as Riak but still - quite easy) sharding of data onto multiple servers.
- CouchDB makes replicating data from one server to another - locally or across the internet - as easy as one HTTP POST.
- For all three, there's no need to design a database schema up-front, and making a change that could take minutes of downtime to add another column to your database just doesn't exist. Simply start storing that new piece of data from your app.
The problem is that all of these come at a cost to the day-to-day productivity of the team. You can't query any of these the same way a SQL database is queried. An ad-hoc query on any of these that might take a minute to write in SQL will take far longer:
- For CouchDB, ad-hoc queries are strongly discouraged and queries must be calculated across a database. For a big dataset, this can take hours. Certain operations can trigger a refresh of these pre-calculated queries and may make the server unresponsive at any point.
- Riak Map/Reduce jobs across a substantial set of data (ie more than a few thousand keys) are not supported.
- MongoDB specifically discourages real-time Map/Reduce jobs and doesn't guarantee performance.
In each case, if I under-think my database decision, I will be paying a daily tax per developer, per query, per feature of my app to avoid a theoretical longer-term debt of a difficult-to-scale SQL database.
I like to think of these little day-to-day Map/Reduce hassles as debt payments on a debt I haven't yet incurred.
What does that mean for the typical new application? When should we consider SQL and when should we consider an alternative?
- If I am going to be doing "joins" between my data in any way, SQL is much, much easier to develop. Example: Users who have accounts and orders. Devices have photos. Friends can share with each other, etc.
- If I just need a key/value store and don't care about searching that datastore, EVER, NoSQL is much more powerful. Example: Web sessions or per-device data that is always associated with a 'master key'.
- If I think I might have to 'fan out' a lot of data - pushing subscriptions to a lot of users, using SQL with NoSQL can be powerful. Keep the list of subscriptions in SQL and the "inbox" in NoSQL. Example: Twitter-like applications with subscriptions.
It doesn't make sense to make debt payments before you have any debt.
When we think NoSQL is the right tool, we generally recommend Riak because the community and enterprise support is so deep, and we believe in the tradeoffs that Riak makes to keep data safe.
We used to recommend NoSQL "out of the gate". We no longer do so because of the hidden costs of day-to-day development. It just doesn't make sense to make debt payments before you've incurred any debt.
This is loosely related to an earlier post. Read more here.