My Vision for an Enhanced Linked Data Architecture – PART 3
In my previous post, I discussed and began to address what I believe to be a current stumbling block to full adoption of Linked Data integration concepts and patterns in today’s Information Interoperability / Information Sharing space. I attempted to address this stumbling block by identifying additional components (the Enhanced parts) of my proposed Enhanced Linked Data Architecture and its current incarnation as the Enhanced Linkeddata Architecture for Persistent Sharing Environments (ELAPSE)™. In this post, I will discuss and began to address a third major stumbling block.
The third major stumbling block to full adoption of these concepts revolves around poor RDF store performance reflected by sluggish data loading times and less than ideal SPARQL Protocol and RDF Query Language (SPARQL) query-response and updating results against distributed and federated models. These characteristics are commonly associated with a majority of today’s RDF (i.e. Triple Store or Graph DB) Stores and LinkedData Framework solutions. Fortunately, these types of performance issues are being addressed on a daily basic by several software development organizations and their supporting open-source communities.
A Linked Open Data 2 (LOD2) project “Big Data RDF Store Benchmarking Experience” blog entry recently recorded a general observation that:
“RDF stores have made significant advances in architecture (cluster-ready) and functionality (Business Intelligence queries), as well as in performance and scalability. By now, we can truly conclude that Big Data projects can make use of RDF technology, and that is a win.”
A recent posting to OpenLink Software, Inc.‘s Orri Erling’s blog stated:
“To get much further in performance, physical storage needs to adapt to the data. Thus, in the long term, we see RDF as a lingua franca of data interchange and publishing, supported by highly scalable and adaptive databases that exploit the structure implicit in the data to deliver performance equal to the best in Structured Query Language (SQL) data warehousing. When we get the schema from the data, we have schema-last flexibility and schema-first performance. The genie is back in the bottle, and data models are unified.”
Just updated on June 24, 2013, this NoSQL Databases for RDF: An Empirical Evaluation page links to some very telling Benchmark Results comparing five different NoSQL stores for RDF processing.
When focused on open source and open standards, some additional existing components (the Enhanced parts) of my proposed ELAPSE™ Architecture that will help address these performance concerns are:
Virtuoso Open-Source Edition – scalable cross-platform server that combines Relational, Graph, and Document Data Management with Web Application Server and Web Services Platform functionality (Github and interview with founder)
MonetDB– pioneered column-store solutions for high-performance data warehouses for business intelligence and eScience since 1993 (Downloads)
My next post will discuss other concerns associated with today’s RDF Stores and Linkeddata Framework solutions that could also be addressed via this ELAPSE™ Architecture.
=david.l.woolfenden