edit: all this is based on retwis.antirez.com memory usage. Ok, just did some ma...

jbellis · on Oct 22, 2009

There's a big difference between sharding across 30 redis nodes, where your application has to be shard-aware, and your ops team has to manually handle failover, etc, and using a database that looks to the app like a single system. In other words redis's story here isn't really any better than sharding a relational db, and everyone knows how much that sucks.

So saying on the home page that "Redis can do [sharding] like any other key-value DB, basically it's up to the client library" is inaccurate. Distributed key-oriented databases like cassandra, voldemort, dynomite, riak handle all of that so it's totally invisible to your app, including (at least in Cassandra's case, and I think dynomite) adding nodes to the cluster.

antirez · on Oct 22, 2009

Hello jbellis,

it's really a matter of design. I like the idea that the Redis servers are dummy, and it's up to the client logic to handle sharding. For instance the Ruby client supports this feature in a way mostly transparent to the client.

In traditional databases sharding is hard not because they are not good at it form the point of view of "feature set" (like in Redis VS Cassandra), but because the data model itself is not right for working with data split across different servers. If you use an SQL DB just with tables accessed by IDs and without queries more complex than lookups by primary key, then sharing starts to become simpler.

Even if Redis will ever get server-side sharding, I'll code another process that handles this issue instead to put the logic inside Redis itself.

Btw how is it possible to build something really horizontally scalable without to use client-level sharding?

What you want is to have N web server and M databases, without any single-dispatch-node. At least this is how I'm used to thing at it.

Without any kind of client help I guess there is some kind of master node handling the dispatching of requests. Maybe I missed the point, please give me some hint.

antirez · on Oct 22, 2009

Oh... I just found this in the high scalability web site (http://highscalability.com/scaling-twitter-making-twitter-10...):

    Update 6: Some interesting changes from Twitter's Evan
    Weaver: everything in RAM now, database is a backup; peaks 
    at 300 tweets/second; every tweet followed by average 126 
    people; vector cache of tweet IDs; row cache; fragment 
    cache; page cache; keep separate caches; GC makes Ruby 
    optimization resistant so went with Scala; Thrift and HTTP 
    are used internally; 100s internal requests for every 
    external request; rewrote MQ but kept interface the same; 
    3 queues are used to load balance requests; extensive A/B 
    testing for backwards capability; switched to C memcached 
    client for speed; optimize critical path; faster to get the 
    cached results from the network memory than recompute them locally.

antirez · on Oct 22, 2009

I'm investigating about this issue more, it could be very interesting to know how many tweets twitter itself is indexing currently to do a precise estimation.