RavenDB – yet another NoSQL DBMS … or not?
Nowadays we see more and more non relational database management systems put forward especially in the PaaS / IaaS field: DynamoDB & SimpleDB (on Amazon), MongoDB, Apache Cassandra, Microsoft Azure Table Storage, CouchDB etc. If we would take a look at the .NET world, what alternatives do we have for Azure Nosql Storage? Many of the most popular NoSQL dbms interface with .NET via different means like web oriented APIs, Thrift interface, COM interop & LINQ adapters built on top of services, but this article will focus on a .NET native one which was built targeting the .NET platform in terms of API, deployment and underlying technologies: RavenDB. This dbms drawn our attention because it’s stated to be transactional and in NoSQL world perhaps the biggest challenge is to enforce transactional writes which in general is not supported.
For the busy ones who only want to scratch the surface of RavenDB here are the conclusions we have drawn after a hands-on approach: RavenDB is an easy to use NoSQL dbms for those familiar with .NET API’s, full supports linq for both queries & indexes, it scales out easily and above all it’s ACID. Even though it’s open source, verify if the licensing model suits you. If you plan on building an application which:
- needs transactional support
- big data is one of your main concerns hence scale out capabilities are a must
- you require advanced search capabilities
- runs on .NET
it’s worth taking a look at RavenDB. The support is well structured and you can have it up and running in no time.
For the others who want to dive into details read on.
Under the hood
RavenDB is a document database management system built on top of Lucene.NET for search, and ESENT for storage. Lucene.NET is the porting of the famous Apache Lucene text search engine on the .NET platform and ESENT is Microsoft’s Extensible Storage Engine (aka Jet Blue Engine used in products such as Exchange Server or Active Directory) which is optimized for fast data access (read it as an ISAM db engine).
Documents in RavenDB are digested by the dbms as JSON’s and binary data is stored as attachments. (1 attachment can be stored across multiple instances of RavenDB called shards).
Some of the cool features
The dbms relies on indexes for serving a query. This means that for every submitted query there has to be an index upon which the data will be retrieved. The user can define indexes (called static indexes) and save them on the server, but that’s not mandatory. If you omit that, the server will create a temporary index suitable for the submitted query and cache it for you. If that index is used multiple times, it will be promoted to a permanent index. Although it’s highly recommended that you design the indexes by yourself (and there is a nice interface based upon lambda expressions for you to build map & reduce functions for that), if you have ad-hoc queries, as they hit the dbms you will experience performance improvement as RavenDB worms up.
Unit of work pattern:
When you design your application in an OO manner, you also expect that manipulating the data will follow the same pattern. Relational DBs by its nature doesn’t comply with that, so ORM’s would need to fill in that gap. In .NET world, relational DBMS is a common choice, hence support for ORM’s is everywhere. Usually those ORM’s follow the unit of work pattern (which relies on the ACIDness of the underlying dbms). With RavenDB you get the same approach, without requiring an ORM: you open a session of work and as you interact with the database, documents get cached in memory, changes on those documents are done in memory, and are persisted on demand with a single call to the SaveChanges() atomic operation. Referential integrity is also preserved: different calls for retrieving the same document will result in the same instance being served back (per shard).
Since RavenDB delegates the search functionality to Lucene.NET search engine you get the option to supply one of the builtin analyzers of Lucene.NET which are used when tokenizing the text-to-be-indexed, and this enables you to benefit for some advanced search features out of the box like free text search with english thesaurus.
Partial document update:
It’s often the case that you would want to update only a part of the document structure (only a subset of the properties a json structure has). In RavenDB there is the concept of patching which supports editing a portion of the stored document without having to load the whole document in mem, change it, and save it back.
The database engine can be hosted within your own application, as a separate application hosted in IIS or as a windows service. The client talks with the server over http when the server is remote and with direct calls when the server is embedded in the caller’s process.
Depending on where you host the RavenDB engine you can setup the first layer of authentication. On the second level, RavenDB supports authentication using OAuth which integrates well with the RESTful API, and it also has a plugin for authorization at document level.
RavenDB is by nature a distributed dbms. This means that your documents can be split across different instances of the dbms called shards (which run on different machines). Although it supports a feature called autosharding which should take the care of splitting the documents per shards off the shoulders of the designer, it’s recommended that you partition your documents by yourself, keeping in mind application logic factors such as multitenancy (its better that documents specific to a regional location to go on the same shard) or transactional operations (transactions should affect documents stored on the same shard, in order to avoid the MSDTC to kick in which brings some performance concerns)
RavenDB implementation follows a pluggable architecture pattern (it was done with MEF – Microsoft Extensibility Framework) which means that the core engine can be extended with different features (called bundles). In fact some of the core features of RavenDB have been developed as bundles eg. Sharding & Replication, Authorization, Delete Cascading.
Cross platform & vendor-lock in concerns
RavenDB was built for the .NET world with everything that means. Although Mono is out there for running .NET code on different platforms, MSDTC (Microsoft Distributed Transaction Coordinator) is a prerequisite for transactional storage of multiple documents across multiple shards (read across separate machines) hence this will not run outside Windows. So if you decide to opt out for RavenDB, it’s highly probable that you will be tied to Windows.
RavenDB is open source, but depending on your project you’ll have to pay a suitable license (it’s free for OSS projects).
When talking about scalability we often think of deployment in cloud. RavenDB was designed for scaling in the .NET world, but it is not out of the box ready for Microsoft’s PaaS. Additional setup with Azure Cloud Drive has to be done to enable RavenDB’s persistent storage.
Reporting is also not it’s strong point since there is a lack of tooling in this area. Although you can pull out whatever aggregate information you need directly through a query using map-reduce techniques, you’ll have to look for/build yourself some reporting tools that can interface with rest apis for pulling data.
There ain’t such thing as free lunch: RavenDB support for transactions relies on MSDTC which can raise performance issues when dealing with transactions that span across different servers. From this point of view extra attention needs to be paid when designing the model.
Another article will follow this one, where we will share insights from one of the experiments we are doing in Yonder Labs with RavenDB so stay close.