How RDF Databases Differ from Other NoSQL Solutions

2010/04/22 by Arto

This started out as an answer at Semantic Overflow on how RDF database systems differ from other currently available NoSQL solutions. I've here expanded the answer somewhat and added some general-audience context.

RDF database systems are the only standardized NoSQL solutions available at the moment, being built on a simple, uniform data model and a powerful, declarative query language. These systems offer data portability and toolchain interoperability among the dozens of competing implementations that are available at present, avoiding any need to bet the farm on a particular product or vendor.

In case you're not familiar with the term, NoSQL ("Not only SQL") is a loosely-defined umbrella moniker for describing the new generation of non-relational database systems that have sprung up in the last several years. These systems tend to be inherently distributed, schema-less, and horizontally scalable. Present-day NoSQL solutions can be broadly categorized into four groups:

RDF database systems form the largest subset of this last NoSQL category. RDF data can be thought of in terms of a decentralized directed labeled graph wherein the arcs start with subject URIs, are labeled with predicate URIs, and end up pointing to object URIs or scalar values. Other equally valid ways to understand RDF data include the resource-centric approach (which maps well to object-oriented programming paradigms and to RESTful architectures) and the statement-centric view (the object-attribute-value or EAV model).

Without just now extolling too much the virtues of RDF as a particular data model, the key differentiator here is that RDF database systems embrace and build upon W3C's Linked Data technology stack and are the only standardized NoSQL solutions available at the moment. This means that RDF-based solutions, when compared to run-of-the-mill NoSQL database systems, have benefits such as the following:

From the preceding points it follows that RDF-based NoSQL solutions enjoy some very concrete advantages such as:

RDF-based systems also offer unique advantages such as support for globally-addressable row identifiers and property names, web-wide decentralized and dynamic schemas, data modeling standards and tooling for creating and publishing such schemas, metastandards for being able to declaratively specify that one piece of information entails another, and inference engines that implement such data transformation rules.

All these features are mainly due to the characteristics and capabilities of RDF's data model, though, and have already been amply described elsewhere, so I won't go further into them just here and now. If you wish to learn more about RDF in general, a great place to start would be the excellent RDF in Depth tutorial by Joshua Tauberer.

And should you be interested in the growing intersection between the NoSQL and Linked Data communities, you will be certain to enjoy the recording of Sandro Hawke's presentation Toward Standards for NoSQL (slides, blog post) at the NoSQL Live in Boston conference in March 2010.


blog comments powered by Disqus