Benchmarks, SisoDb and RavenDb

This post is an extract from my recently posted post where I reply on criticism against SisoDb from a member in the RavenDb team. You can read it here: http://daniel.wertheim.se/2012/03/11/ranting-is-good-for-you/

First lets make it clear. I have never had any intentions creating a RavenDb vs SisoDb scene, these benchmarks are a result of Mr Itamar Syn-Hershkos criticism on twitter lately, where I feel he hasn’t got all information but mostly talking out of the old design of SisoDb.

Results

Using well known infrastructure in SQL Server, creating data type specific indexes and sets of data for simple key-values SisoDb is fast. And as it seems, actually faster than RavenDb in the measured scenarios below.

Note about deserialization of matches

NOTE! In medium and large set of data, when querying for Trips the result is much bigger that what RavenDb seems to return; where SisoDb returns all the actual matches, hence you can’t compare those times fairly. RavenDb’s 128 returned and deserialized records, vs SisoDb’s 16668 returned and deserialized records.

RavenDb – returning 128 matches

SisoDb – returning all 16668 matches

Memory consumption

Just looking at the client application that runs the tests. When running with Large sets 100.000 customers and 100.000 trips:

SisoDb 116Mb

RavenDb 1.16Gb

Differences

RavenDb and SisoDb are different. RavenDb has lots of concepts like dynamic vs static indexes, stale data, not returning all matching records etc. SisoDb aims at being Simple. E.g it does all work upfront and makes every leaf in your object hierarchy indexed. It doesn’t use any proxies etc. for change tracking etc. SisoDb is trying to be a fast, non magical, lightweight document-oriented provider over SQL-Server. Just take a look at the memory footprints above.

When using RavenDb in the tests below I have not forced wait for stale results in the timer that measures the operations. Hence the real insert time would actually be longer than indicated.

Both providers are used out of the box but with warm up behavior. Before timing any operations; inserts, queries and deletes have been executed so that each system gets a chance to create cache plans etc. Note that I’m not a skilled RavenDb user. I just used it out of the box. That is, no tweaking what so ever with static indexes etc. I couldn’t find any batch insert API in RavenDb; which SisoDb of course has.

Test machine

The machine is a simple laptop running both the server and client.
Windows 7 Ultimate, SQL2012
Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz
8GB RAM
237Gb SSD with 190Gb free, Samsung PM810

Versions

RavenDb: v1.0.701
SisoDb: v10.4.2

Scenarios

The test code is available at GitHub. The scenarios hasn’t much to do with a real life situation and you should test it in your environment.

The test has three modes and each mode works with two different type of documents.

  • Small set of data (1.000 items)
  • Medium set of data (10.000 items)
  • Large set of data (100.000 items)

Test cases

public interface ITestCases
{
	void Warmup(
            Expression<Func<Customer, bool>> customerPredicate, 
            Expression<Func<Trip, bool>> tripPredicate);

	void BatchInsertCustomers(int numOfCustomers);
	void BatchInsertTrips(int numOfTrips);
	void SingleInsertCustomer();
	void SingleInsertTrip();

	long QueryCustomers(Expression<Func<Customer, bool>> predicate);
	long QueryTrips(Expression<Func<Trip, bool>> predicate);

	//Count methods only used after profiling to get number of items inserted
	long CountCustomers();
	long CountTrips();
}

Customer model

public class Customer
{
	public Guid Id { get; set; }
	public int CustomerNo { get; set; }
	public string Firstname { get; set; }
	public string Lastname { get; set; }
	public ShoppingIndexes ShoppingIndex { get; set; }
	public DateTime CustomerSince { get; set; }
	public Address BillingAddress { get; set; }
	public Address DeliveryAddress { get; set; }

	public Customer()
	{
		ShoppingIndex = ShoppingIndexes.Level0;
		BillingAddress = new Address();
		DeliveryAddress = new Address();
	}
}

public class Address
{
	public string Street { get; set; }
	public string Zip { get; set; }
	public string City { get; set; }
	public string Country { get; set; }
	public int AreaCode { get; set; }
}

public enum ShoppingIndexes
{
	Level0 = 0,
	Level1 = 10,
	Level2 = 20,
	Level3 = 30
}

Trip model

public class Trip
{
	public int Id { get; set; }
	public Transport Transport { get; set; }
	public Accommodation Accommodation { get; set; }
	public decimal Price { get; set; }
}

public class Transport
{
	public string DepartureCode { get; set; }
	public string DestinationCode { get; set; }
	public DateTime DepartureDate { get; set; }
	public int Duration { get; set; }
}

public class Accommodation
{
	public string HotelCode { get; set; }
	public string RoomCode { get; set; }
	public DateTime CheckinDate { get; set; }
	public int Duration { get; set; }
}

Customer query predicates
The queries for customers used in the tests.

//Small set
CustomerPredicate = c =>
	c.CustomerNo >= 500 && c.CustomerNo <= 550
	&& c.DeliveryAddress.Zip == "525",

//Medium set
CustomerPredicate = c =>
	c.CustomerNo >= 5000 && c.CustomerNo <= 5500
	&& c.DeliveryAddress.Zip == "5250",

//Large set
CustomerPredicate = c =>
	c.CustomerNo >= 50000 && c.CustomerNo <= 55000
	&& c.DeliveryAddress.Zip == "52500",

Trip query predicate
The query for trips used in the tests.

//Same for all test sets
var dateFrom = TestConstants.BaseLine.AddDays(10);
var dateTo = dateFrom.AddDays(5);

TripPredicate = t =>
	(t.Transport.DepartureDate >= dateFrom
	&& t.Transport.DepartureDate <= dateTo)
	&& t.Accommodation.Duration > 8

Screenshots of each testrun are inlcuded in the main branch at GitHub.

//Daniel

Ranting is good for you

Yet again I’ve ended up on the receiver side of some ranting; and yet again I say “thank you“. Criticism is most welcome. Now lets have a look at some of the opinions.

As you might know I run an open-source project (http://sisodb.com) which is somewhat a document-oriented provider over SQL-Server. It has a provider model and as of now supports: SQL2008, SQL2012 and SQLCE4. But wait that’s for a RDBMS system, right? Yes, and that’s mostly what the ranting is all about. Hence to get some people to pipe down, I guess I could go and write one for MongoDb, but perhaps that isn’t good enough.

Benchmarks

This post was never about benchmarking RavenDb and SisoDb, but of course I have had to show some numbers. Detailed info about the benchmarks is located in the bottom of the post alternatively you could find them here: http://daniel.wertheim.se/2012/03/12/benchmarks-sisodb-and-ravendb/

It all started in May

In May last year I got some most welcomed criticism (available as PDF here) from Mr Itamar Syn-Hershko, to which I wrote a reply. He’s a nice guy from the RavenDb team, that I have had the honor to met and discuss RavenDb and SisoDb with in real life. And lets make it clear: RavenDb seems like an awesome product and has gained great popularity and what would you expect from a product from Mr Ayende & CO.

No, SisoDb is not looking for world domination

SisoDb is not looking for world domination. Features are added as we need them and it’s not a commercial project and I hope it’s not near RavenDb in features, that would be scary. SisoDb is trying to be a simple solution and it’s up to you to deside if it’s enough for some of your scenarios.

One or two things has happened since 16th of May 2011

Since May a lot of things have happened. I have changed my haircut; I’ve gotten kid number two and SisoDb has gone from v2.1.1 to v10.x using semantic versioning; and has been rewritten: http://sisodb.com/wiki/release-notes

This is something I have pointed out in the comments of Itamar’s blog, (as well as in real life). But never mind, facts about how a system actually is designed are probably irrelevant.

March 6th, 2012 – It happened again

A person started to test SisoDb for their needs. Doing some initial testing; and while doing this dared to tweet about it. Then I, as well as some others, re-tweeted and after a while the discussion was started.


[1]: https://twitter.com/#!/CodingInsomnia/status/177041826103037952
[2]: https://twitter.com/#!/CodingInsomnia/status/177048144025100289

A quick pause to reflect on the task of trying things out

Before continuing. Doing what @CodingInsomnia does, is excellent. Whatever solution you chose as your data access component (DAC), You should evaluate it for your X NUM OF CONTEXTS. In one it might shine and in another it might be totally off. Perhaps you should use a document oriented solution in scenario A but in B you might be better off with an OR/M or a key-value store. That is, don’t you dare to ever select only one and make it fit in every solution just because it shines in one. Not with Entity framework, not with NHibernate, RavenDb, MongoDb, SisoDb…. I guess you get the point. Build your own set of facts for your scenarios.

Lets continue…

So, the discussion was started. Now lets just have a quick look at some of the criticizing tweets.


[3]: https://twitter.com/#!/synhershko/status/177140124310704130

Great, neither is SisoDb but in a concurrent world it doesn’t hurt and if I really wanted to focus on it, I would probably make background indexing available as an option to those comfortable with eventual consistency and stale data as in RavenDb. But then again, it’s still not inserted until data for the indexes has been inserted, because before that, it’s not queryable. Hence, SisoDb, as of now, makes all leafs in your object-graph indexed and queryable up front. So when the insert is executed everything is in place with SisoDb. Of course you can tell it not to index everything.

Side note on the JSON serializers

BTW, if RavenDb is truly about performance, why JSON.Net? I guess it’s because it’s flexible and feature rich. SisoDb, on the other hand relies on ServiceStack.Text. Compare benchmarks here: http://theburningmonk.com/benchmarks/ These stats compares the v4.07 release of JSON.Net and as of now there’s a v4.08 release and perhaps it for once outperforms ServiceStack.Text. I would be surprised though.


[4]: https://twitter.com/#!/synhershko/status/177150025175019520

SisoDb, not being a commercial project and all, I would be scarred of myself if SisoDb had all features that RavenDb does, but still, SisoDb does support includes in the same query as well, and yes it will require a join to that referenced structure. But if you really have an use-case with loading lots of documents that refer to eachother in one query; maybe you shouldn’t use documents at all. As with N+1, should that actually be a case in a document DB? Could it be that your instead should have designed your documents for each scenario and accept duplication of data? Don’t know. But a real life document would probably extract parts of the document it refers to and if needed you could follow the reference and dig deeper. And if it’s a problem to do that ONE extra request on ID, then you have a problem.

But, yes, as of today when selecting out your documents, you have to do your projection and aggregation in a yielded data flow. But again. You can model this in your code. Have a denormalizer that works against a document that is designed for your aggregation needs. And BTW, you can use both stored procedures and raw SQL both within and outside SisoDb. So there’s a solution of aggregations etc.


[5]: https://twitter.com/#!/synhershko/status/177150234886021120

Anyone believing any data access solution is THE SOLUTION is always off. You should always evaluate per context and if you don’t, I guess it’s a good thing that RavenDb is the perfect solution and the solution to every problem concerning data access. By saying that, it then doesn’t only outshine both small projects like SisoDb but bigger once to. Like: StarCounterDb, MongoDb, Cassandra, Redis, CouchDb…..


[6]: https://twitter.com/#!/synhershko/status/177150380965244928

So SQL-Server is the most unoptimized database? If we asume this is true; true that the guys at Microsoft has made a complete failure in assembling their database; couldn’t it be that it’s enough in some cases? I do bet my current car (yeah I know it’s only a Ford Mondeo) on that somewhere in the world, there’s a system that is somewhat complex and actually has good performance in their SQL server.


[6]: https://twitter.com/#!/synhershko/status/177150934961487872


[7]: https://twitter.com/#!/synhershko/status/177151141249953793

As I said, lots of things has happened with SisoDb and I’m sure that’s the case with RavenDb as well. And it’s a good thing SisoDb really is normalized then: http://daniel.wertheim.se/2012/01/17/sisodb-v9-0-released/, http://sisodb.com/wiki/core-concepts Don’t really see how I should normalize it more? I mean, as of now every indexed property get’s a key-value with an index optimized for it’s data type. Also, to clarify for some readers,when using a RDBMS like SQL Server you sometimes really have to denormalize your data. It could be setting up a denormalized table or view for the sake of not having to heavy joins when working and e.g doing lots of aggregations on large sets of data.

So I guess SisoDb sucks?

With all these alleged design flaws from Mr Itamar, I do guess that SisoDb sucks. Surely this must be the case, right? Surely there can’t be a case where SisoDb performs well enough. A case where the enterprise needs/wants to stick with SQL Server cause they are familiar and pleased with all the great infrastructure that comes with it. I do respect DBAs and I hope that SisoDb lets them have the opportunity to e.g partion tables; suggest replication to get separate read and write models; or perhaps let a BI person set up a SSIS job to do transformations of the JSON to fact and leaf tables.

An user was also kind enough to point out that SisoDb works great in shared hosting scenarios as well, since SisoDb in itself doesn’t demand lots of resources but relies on the SQL Server. Now I don’t know how that is for other solutions out there, but at least SisoDb works in that environments.

Benchmarks

First lets make it clear. I have never had any intentions creating a RavenDb vs SisoDb scene, these benchmarks are a result of Mr Itamar Syn-Hershkos criticism on twitter lately, where I feel he hasn’t got all information but mostly talking out of the old design of SisoDb.

Results

Using well known infrastructure in SQL Server, creating data type specific indexes and sets of data for simple key-values SisoDb is fast. And as it seems, actually faster than RavenDb in the measured scenarios below.

Note about deserialization of matches

NOTE! In medium and large set of data, when querying for Trips the result is much bigger that what RavenDb seems to return; where SisoDb returns all the actual matches, hence you can’t compare those times fairly. RavenDb’s 128 returned and deserialized records, vs SisoDb’s 16668 returned and deserialized records.

RavenDb – returning 128 matches

SisoDb – returning all 16668 matches

Memory consumption

Just looking at the client application that runs the tests. When running with Large sets 100.000 customers and 100.000 trips:

SisoDb 116Mb

RavenDb 1.16Gb

Differences

RavenDb and SisoDb are different. RavenDb has lots of concepts like dynamic vs static indexes, stale data, not returning all matching records etc. SisoDb aims at being Simple. E.g it does all work upfront and makes every leaf in your object hierarchy indexed. It doesn’t use any proxies etc. for change tracking etc. SisoDb is trying to be a fast, non magical, lightweight document-oriented provider over SQL-Server. Just take a look at the memory footprints above.

When using RavenDb in the tests below I have not forced wait for stale results in the timer that measures the operations. Hence the real insert time would actually be longer than indicated.

Both providers are used out of the box but with warm up behavior. Before timing any operations; inserts, queries and deletes have been executed so that each system gets a chance to create cache plans etc. Note that I’m not a skilled RavenDb user. I just used it out of the box. That is, no tweaking what so ever with static indexes etc. I couldn’t find any batch insert API in RavenDb; which SisoDb of course has.

Test machine

The machine is a simple laptop running both the server and client.
Windows 7 Ultimate, SQL2012
Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz
8GB RAM
237Gb SSD with 190Gb free, Samsung PM810

Versions

RavenDb: v1.0.701
SisoDb: v10.4.2

Scenarios

The test code is available at GitHub. The scenarios hasn’t much to do with a real life situation and you should test it in your environment.

The test has three modes and each mode works with two different type of documents.

  • Small set of data (1.000 items)
  • Medium set of data (10.000 items)
  • Large set of data (100.000 items)

Test cases

public interface ITestCases
{
	void Warmup(
            Expression<Func<Customer, bool>> customerPredicate, 
            Expression<Func<Trip, bool>> tripPredicate);

	void BatchInsertCustomers(int numOfCustomers);
	void BatchInsertTrips(int numOfTrips);
	void SingleInsertCustomer();
	void SingleInsertTrip();

	long QueryCustomers(Expression<Func<Customer, bool>> predicate);
	long QueryTrips(Expression<Func<Trip, bool>> predicate);

	//Count methods only used after profiling to get number of items inserted
	long CountCustomers();
	long CountTrips();
}

Customer model

public class Customer
{
	public Guid Id { get; set; }
	public int CustomerNo { get; set; }
	public string Firstname { get; set; }
	public string Lastname { get; set; }
	public ShoppingIndexes ShoppingIndex { get; set; }
	public DateTime CustomerSince { get; set; }
	public Address BillingAddress { get; set; }
	public Address DeliveryAddress { get; set; }

	public Customer()
	{
		ShoppingIndex = ShoppingIndexes.Level0;
		BillingAddress = new Address();
		DeliveryAddress = new Address();
	}
}

public class Address
{
	public string Street { get; set; }
	public string Zip { get; set; }
	public string City { get; set; }
	public string Country { get; set; }
	public int AreaCode { get; set; }
}

public enum ShoppingIndexes
{
	Level0 = 0,
	Level1 = 10,
	Level2 = 20,
	Level3 = 30
}

Trip model

public class Trip
{
	public int Id { get; set; }
	public Transport Transport { get; set; }
	public Accommodation Accommodation { get; set; }
	public decimal Price { get; set; }
}

public class Transport
{
	public string DepartureCode { get; set; }
	public string DestinationCode { get; set; }
	public DateTime DepartureDate { get; set; }
	public int Duration { get; set; }
}

public class Accommodation
{
	public string HotelCode { get; set; }
	public string RoomCode { get; set; }
	public DateTime CheckinDate { get; set; }
	public int Duration { get; set; }
}

Customer query predicates
The queries for customers used in the tests.

//Small set
CustomerPredicate = c =>
	c.CustomerNo >= 500 && c.CustomerNo <= 550
	&& c.DeliveryAddress.Zip == "525",

//Medium set
CustomerPredicate = c =>
	c.CustomerNo >= 5000 && c.CustomerNo <= 5500
	&& c.DeliveryAddress.Zip == "5250",

//Large set
CustomerPredicate = c =>
	c.CustomerNo >= 50000 && c.CustomerNo <= 55000
	&& c.DeliveryAddress.Zip == "52500",

Trip query predicate
The query for trips used in the tests.

//Same for all test sets
var dateFrom = TestConstants.BaseLine.AddDays(10);
var dateTo = dateFrom.AddDays(5);

TripPredicate = t =>
	(t.Transport.DepartureDate >= dateFrom
	&& t.Transport.DepartureDate <= dateTo)
	&& t.Accommodation.Duration > 8

Screenshots of each testrun are inlcuded in the main branch at GitHub.

Taken to much time

Now, this has taken to much off my time and I really have better things to do. Now, go and improve your projects. If that means stop using SisoDb, then do it. Me, I will go and make things better in SisoDb and other projects.

//Daniel

Writing my own NoSql DB?

Yesterday I got a thought:

why not write something very simple that can store object-graphs without mappings and other fuss.

Yes I know there’s MongoDb, RavenDb and several others, but it’s always a great deal of fun to write something of your own. So, inspired by Ayende’s technology choices, I spent a few hours last night just fiddling around with Lucene.Net and Json.Net. The result:

A simple model

public class Address
{
    public string Street { get; set; }
    public string Zip { get; set; }
    public string City { get; set; }
    public string Country { get; set; }
}

public class Customer
{
    [Key]
    public Guid? Id { get; set; }

    [Index]
    public string Firstname { get; set; }

    [Index]
    public string Lastname { get; set; }
        
    [Index]
    public int ShoppingIndex { get; set; }

    public Address BillingAddress { get; private set; }
    public Address DeliveryAddress { get; private set; }

    public Customer()
    {
        BillingAddress = new Address();
        DeliveryAddress = new Address();
    }
}

Consuming a Storage-provider

var customer = new Customer
                    {
                        Id = Guid.NewGuid(),
                        Firstname = "Daniel",
                        Lastname = "Wertheim",
                        ShoppingIndex = 99
                    };
customer.DeliveryAddress.Country = "Sweden";

var store = new LuceneStructureStore();
store.Insert(customer);

var refetched = store.GetByKey<Customer>(customer.Id.ToString());
...
...

Maybe it will grow to something useful. In the meantime I will continue my work with my MongoDB-provider, Simple-MongoDB.

//Daniel