inaka

Latest blog entries

/
The Art of Writing a Blogpost

The Art of Writing a Blogpost

Apr 11 2017 : Matias Vera

/
SpellingCI: No more spelling mistakes in your markdown flies!

Feb 14 2017 : Felipe Ripoll

/
Fast reverse geocoding with offline-geocoder

Do you need a blazing fast reverse geocoder? Enter offline-geocoder!

Jan 18 2017 : Roberto Romero

/
Using Jayme to connect to the new MongooseIM REST services

MongooseIM has RESTful services!! Here I show how you can use them in an iOS application.

Dec 13 2016 : Sergio Abraham

/
20 Questions, or Maybe a Few More

20 Questions, or Maybe a Few More

Nov 16 2016 : Stephanie Goldner

/
The Power of Meeting People

Because conferences and meetups are not just about the technical stuff.

Nov 01 2016 : Pablo Villar

/
Finding the right partner for your app build

Sharing some light on how it is to partner with us.

Oct 27 2016 : Inaka

/
Just Play my Sound

How to easily play a sound in Android

Oct 25 2016 : Giaquinta Emiliano

/
Opening our Guidelines to the World

We're publishing our work guidelines for the world to see.

Oct 13 2016 : Brujo Benavides

/
Using NIFs: the easy way

Using niffy to simplify working with NIFs on Erlang

Oct 05 2016 : Hernan Rivas Acosta

/
Function Naming In Swift 3

How to write clear function signatures, yet expressive, while following Swift 3 API design guidelines.

Sep 16 2016 : Pablo Villar

/
Jenkins automated tests for Rails

How to automatically trigger rails tests with a Jenkins job

Sep 14 2016 : Demian Sciessere

/
Erlang REST Server Stack

A description of our usual stack for building REST servers in Erlang

Sep 06 2016 : Brujo Benavides

/
Replacing JSON when talking to Erlang

Using Erlang's External Term Format

Aug 17 2016 : Hernan Rivas Acosta

/
Gadget + Lewis = Android Lint CI

Integrating our Android linter with Github's pull requests

Aug 04 2016 : Fernando Ramirez and Euen Lopez

/
Passwordless login with phoenix

Introducing how to implement passwordless login with phoenix framework

Jul 27 2016 : Thiago Borges

/
Beam Olympics

Our newest game to test your Beam Skills

Jul 14 2016 : Brujo Benavides

/
Otec

Three Open Source Projects, one App

Jun 28 2016 : Andrés Gerace

/
CredoCI

Running credo checks for elixir code on your github pull requests

Jun 16 2016 : Alejandro Mataloni

/
Thoughts on rebar3

Thoughts on rebar3

Jun 08 2016 : Hernán Rivas Acosta

/
See all Inaka's blog posts >>

/
Sorting by popularity like Reddit with Ruby, PostgreSQL and Elastic, part 2

A photo of Flavio Granero wrote this on April 01, 2015 under dev, elastic, postgres, ruby .

In the first part of this post, we started presenting how to create a ranking of items sorted by popularity and following the Reddit style, using Ruby and a SQL function that was published by Reddit team in Github. The problem with our initial application is that it does not scale very well. In this second and final part we'll show how to update the code to prevent performance issues when the published posts number grows, caching the value of hot_score() using the Elastic index, preventing it to be calculated for every request. Then we will solve a common problem found in dynamic rankings, when we need to paginate their items and a item position can be changed between requests, this time using the scroll feature also available in Elastic.

Building a hot_score cache for scalability reasons

When the number of posts tends to increase, sorting items by the host_score() SQL function becomes very costly for the database, because it needs to calculate the value for each item every time the ordered list is requested (check the first part of this text for a more detailed analysis).

With Elastic, we can store the Post hot_score into the index, building a sort of cache system. In order to do that in our Rails application example, we will make use of the gem searchkick.

After you have it properly installed, let's setup our Post model with the following code:

# app/models/post.rb
class Post < ActiveRecord:Base
  belongs_to :user
  # ...
  searchkick
  # method required by searchkick to build the post json stored into the index
  def search_data
    if attributes.keys.include?("hot_score")
      hot_score = self["hot_score"]
    else
      select_sql = "hot_score(up_score, down_score, created_at) as hot_score"
      hot_score = Post.select(select_sql).find(id).try(:[], "hot_score")
    end
    {
      name: name,
      user_id: user_id,
      hot_score: hot_score.to_f,
      created_at: created_at
    }
  end

  # class function used by searchkick to load items from db
  def self.search_import
    select("id, name, image_url, user_id, created_at, "\
      "hot_score(up_score, down_score, created_at) as hot_score")
  end
end

After that, we need to deal with cache update. Searchkick provides a handy rake task to rebuild the index:

rake searchkick:reindex CLASS=Post

In the production server, it's a good idea to setup a cron job responsible to update Posts index periodically. Something like every 5 minutes, for instance. We recommend a gem called whenever because it makes the cron tasks management very easy in a Rails application. With that, we're able to create a schedule.rb file with the content:

# config/schedule.rb
every 5.minutes do
  rake "searchkick:reindex CLASS=Post"
end

You may be asking yourself why is this necessary, since the searchkick gem has features to automatically update index entries whenever a Post is created or edited. Actually, the value of hot_score changes over time, even if a Post is not changed. You should remember that the SQL function we're using receives the posting creation date to calculate the score value, wich generates the logarithmic scale we want. This cron job ensures that the ranking is updated every 5 minutes.

From now our application code must be updated to replace database queries by Elastic searches:

# replacing ActiveRecord calls
Post.ranking.limit(10)
# by Elastic searches
Post.search("*", order: {hot_score: :desc}, per_page: 10)

Logically search_kick and Elastic give us many more search options, but here we are just showing how to replace the ActiveRecord queries.

Paginating the ranking items with ElasticSearch

Another problem when creating this sort of ranking refers to how to properly paginate the sorted list, when this list is being returned by and application api, for example.

Let's analyse a real situation: a smartphone application asks the server for the top 10 posts ordered by popularity, using our Rails application api. Then, the smartphone user scrolls the list and the application starts requesting the second page of items. Sounds like nothing can go wrong, but what happens if, between the first and the second requests, the order of items in the server has changed? It's a very common scenario, because new posts are been created and new votes are been counted.

How can we return the next items on the ranking without missing or duplicating items?

The answer is in a nice Elastic feature. Elastic has a function very similar to database cursors, but with a better performance and the most important thing, it requires less machine resources. we're talking about scroll.

In summary, Elastic maintains a snapshot of the search result for a limited time. We just need to add a scroll parameter when running a normal search query. This parameter must contain the period time we wish the snapshot to be active. Let's look at a request example to the elastic service using curl:

curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '
{
    "query": {
        "match" : {
            "title" : "elastic"
        }
    }
}

In the above example Elastic returns a scroll_id value along with the search response. This scroll_id is valid for 1 minute, and in order to retrieve the next results, we just make a new request with a unique scroll_id parameter to a url like showed below (assuming the returned scroll_id was c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1):

curl -XGET  'localhost:9200/_search/scroll?scroll=1m' -d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1'

To have the same elastic search in ruby we need to extend the gem searchkick a bit, extending the Query class to give us the scroll parameter support (it is automatically removed in the standard version). The extension might be placed in an initializer file:

# config/initializers/searchkick.rb
module Searchkick
  class QueryWithScroll < Query
    def params
      params = super
      params.merge!(scroll: options[:scroll]) if options[:scroll]
      params
    end
  end
end

Now our application code is updated to instantiate a QueryWithScroll object:

# loading the top posts in the ranking and requesting a new scroll_id
query = Searchkick::QueryWithScroll.new(Post, "*", load: true, scroll: "5m", order: {hot_score: :desc}, per_page: 10)
search = query.execute
@posts = search.results
@scroll_id = search.response["_scroll_id"]

The code above is requesting a 5 minutes snapshot, saving the posts entires and the scroll_id required to get the next page into instance variables.

# loading the next ranking page sending a provided scroll_id
response = Searchkick.client.scroll({ scroll_id: params[:scroll_id], scroll: "5m" })
search = Searchkick::Results.new(Post, response, {})
@posts = search.results
@scroll_id = search.response["_scroll_id"]

You may notice that searchkick also does not provide and easy access to scroll endpoint, so we are directly accessing the internal client and encapsulating the Elastic raw result into a Searchkick::Results instance, making it looks like our initial search (without a scroll_id). Keep in mind that for every new search, a new scroll_id is generated and it needs to be used to retrieve the next results page.

Conclusion

Ordering items with hot_score is a widely used technique applied in applications where new postings are added very often, and these postings are sorted by a received votes calculation. To give opportunity to recent postings to be well positioned, social aggregators like Reddit has created special score algorithms to keep attracting visitors interested in fresh and qualified content.

This article concludes our example, showing how an SQL function and Elastic can be combined inside a Ruby on Rails application to build a popularity ranking that is scalable, without performance issues, and with paginated results without missings or duplications.

A photo of

Flavio Granero

Full Stack Developer