Saturday, January 12, 2013

Graph Database Resources


I got this from my collaborator Joey Gonzalez:
A paper that summarizes the state of graph databases that might be worth reading:
   http://swp.dcc.uchile.cl/TR/2005/TR_DCC-2005-010.pdf
A nice paper describing how databases systems are built.  In particular it talks about the isolation of storage and computation dependencies in a database:
  http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
Regarding actual performance of databases for Graphs, I got an interesting link from my collaborator Yucheng Low:
I found an interesting benchmark comparing MySQL NDB against Memcached you may be interested in.
Summary: MySQL NDB faster than Memcached. http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
Really only faster if the entire NDB table can fit in memory (and disk write flushes are disabled). If HDD IO is necessary, it slows down quite a lot.  Of course, MySQL sharding+replication can be used to keep things running instead of going to disk.

Additional interesting resource I got from my collaborator Aapo Kyrola, regarding Twitter's FlockDB implementation which implements a graph database in twitter:
The blog post by the Twitter engineering team discusses in quite a lot of detail how they extract so much performance from MySQL, worth a read: http://engineering.twitter.com/2010/05/introducing-flockdb.html  
Our goals were: 
  • Write the simplest possible thing that could work. 
  • Use off-the-shelf MySQL as the storage engine, because we understand its behavior — in normal use as well as under extreme load and unusual failure conditions. 
  • Give it enough memory to keep everything in cache. 
  • Allow for horizontal partitioning so we can add more database hardware as the corpus grows. 
  • Allow write operations to arrive out of order or be processed more than once. (Allow failures to result in redundant work rather than lost work.) FlockDB was the result. 

I got from Carlos Guestrin an Overview of SQL vs. no-SQL data stores.


1 comment:

  1. You should get some of your content promoted on DZone.com - we have topical portals focused on both NoSQL solutions and Big Data technologies. If you're interested in reaching a wider audience of advanced developers, email me at egenesky@dzone.com

    ReplyDelete