Making Command Query Responsibility Separation, RDBMS, and SOLR Work

Command Query Responsibility Separation (CQRS) is a mechanism by which the Query object model that a client uses for retrieving information is different from the Command object model that a client uses to modify the information.

Benefits of using CQRS include:

Distinct object models for Commands and Queries may make a complicated system easier to create and maintain.
The ability to choose technologies that are optimized for either Command or Query operations provides better performance for query intensive use cases.
The ability to independently scale the systems responsible for supporting Queries and Commands. For example, if you have a system that supports a high search volume, the Query infrastructure can be scaled up without touching the Command infrastructure.

In this article, we describe an architecture that uses a relational database and SOLR to implement the CQRS architectural pattern. We also provide an example application that requires both full-text search and frequent data modifications that leverages this architecture.

Example Application

A library wants to implement a system that stores its book catalog and patron checkout history. Some features that the system should support include:

Catalog Search: Search for books in the library by title, author, genre, etc.
Patron’s Current Checkout List: List all of the books that a patron has currently checked out. Filter the list by book metadata.
Patron History: View a paginated history of books that the user has checked out within a date range.
Checkout Books: Allow a user to checkout books. Ensure that two patrons cannot checkout the same book at the same time.
Catalog Update: Add books to the catalog or change incorrect catalog information.

Logical Architecture

To implement our system, we will use a relational database, such as Postgres, to support our Command model. The Command data store is the source of record for patron data and checkout history. By using a relational database for Command operations, we can easily ensure that two patrons cannot checkout the same book at the same time.

Since Postgres does have full-text search capabilities, we could also use it to provide catalog search functionality, however, we’ve found that Postgres full-text search can be difficult to configure and to maintain. Instead, we are using SOLR to support our Query model. SOLR is an enterprise search platform that provides high-performance full-text and faceted search. Our SOLR deployment is not the source of record for any data, but, instead, is more like a real-time reporting database.

Our logical architecture shows that the front-end connects to a set of webservers, which, depending on what the user is doing, will either perform Command or Query operations. There is a process that synchronizes catalog updates and checkout data from Postgres to SOLR. An alternative architecture might be to allow the front-end to connect to SOLR directly for Query operations.

Domain Models

Command Model

Our simplified Command model consists of six types of objects.

Patron contains information about users. Every time a Patron checks out a Book, a Checkout object is created.
Book contains all of the metadata that a Patron would like to search. Every Book has a unique identifier. For this simple application, we assume that there is only one copy of each book available for checkout.
Checkout associates a Patron with a Book.
System Revision represents a change to the system. For example, a new Checkout or the return of a book.
Book History associates a particular System Revision to each Book creation or metadata update.
Checkout History associates a particular System Revision to each Checkout creation or update.

The Command model translates well to a relational database that can maintain the relational integrity between objects in the model.

Query Model

Our simplified Query model consists of two objects.

Book has all the metadata that allows for searching the library’s catalog.
Current Checkout indicates who, if anyone, has currently checked out the Book.

It is trivial to represent these two objects in a single SOLR document.

Use Cases

Let’s look at how the data models support the application’s use cases.

Catalog Search

The catalog search functionality is supported entirely by the Query data store. SOLR provides filtering, pagination, and faceting out of the box. In fact, that’s its entire reason for being. Since our Query data store also includes current checkout information, we can indicate which books are currently available, and/or we can allow the user to filter their results to only include available books.

Patron’s Current Checkout List

Like the catalog search, the current checkout list is supported entirely by the Query data store. We simply query SOLR for all of the documents where the current checkout matches the patron.

Patron History

To view a patron’s history, we actually have to query the Command store. Since this query doesn’t need a text search, we wouldn’t get any benefit from attempting to support this query directly via SOLR anyway. Once a web server retrieves a page of patron history results, it can then decorate those results with catalog metadata from the Query data store.

Checkout Books

When a patron wants to checkout a book, we insert a record into the Checkout table of the Command data store. The act of checking out a book does not result in any queries to the Query data store, although there will be an asynchronous update of the Query store as soon as possible.

Catalog Update

Catalog update involves filling out a form that contains book metadata. Like checking out a book, updating the catalog does not result in any queries to the Query data store.

Implementation details

Preventing Duplicate Checkouts

When multiple users attempt to check out the same book, we must be sure that only one of them can do so. Because our system has separate Query and Command data stores that are not immediately consistent, this situation may happen relatively frequently. The Query store could report a book as available even when it is not.

Various relational databases have different mechanisms that allow us to enforce our single Checkout constraint. For example, with Postgres, we create a partial unique index so that only one Checkout may exist for a particular book where the Return Date is null.

Data Store Synchronization

Depending on what a front-end user is doing, they will need to retrieve data from the Query data store, the Command data store, or both the Query and Command data stores. Because our solution uses multiple data stores, we have to perform a one-way synchronization between our Command data store and our Query data store to make sure that the user is able to retrieve the information they need.

Our implementation uses Envers to track the changes that a user has made and a custom process to perform the synchronization. Envers makes it very easy to track when a user has made a change to the Command store that we need to synchronize to the Query store. With Envers, every change to an “audited” table such as the Book table or Checkout table results in a new equivalent history record and a new revision. We can convert those revisions into system states and then track which system states still need to be synchronized. A future blog post will describe in detail how to use Envers with a READ COMMITTED relational database to support system state tracking and synchronization.

Our custom process checks periodically for any changes to the Command store system state table and then updates SOLR with those changes. Note that, if multiple systems needed to know about changes to the Command store, we might add a message broker such as RabbitMQ to our system.

Updating the Query Data Store

Updating the query data store is simple. The synchronization process just submits a new document to SOLR and that will replace the existing document.

Notice that the System State is included in the Current Checkout information. We use SOLR’s Optimistic Concurrency capabilities to ensure that, if multiple threads attempt to modify the same SOLR document, only the latest information (largest system state ID) will be used regardless of the order in which the threads perform their updates.

Data Retrieval

In our logical architecture, a set of web servers provides a Catalog API that supports both Command and Query operations. Depending on the use case, the Catalog API will do one of the following things:

Catalog Search: Request data from the Query data store
Current Checkout List: Request data from the Query data store
Patron History: Request data from the Command data store and decorate the results with data from the Query data store
Checkout Books: Issue a Command to the Command data store
Catalog Update: Issue a Command to the Command data store

This could also be implemented as Command and Query APIs provided by distinct sets of web servers. In that case, the front-end would be responsible for decorating patron history with catalog metadata.

Summary

In this blog post, we’ve described a simple application that uses the Command Query Responsibility Separation architectural pattern. In a real-world application, we’d have many more issues to consider. For example, the source of record for our book metadata might be a separate data store, or we might have multiple copies of each book available for checkout. With a little tweaking, the system we’ve described can easily handle both of these situations. While CQRS is certainly not appropriate for all applications, it can be very useful for a system with full-text search and frequent data modifications that don’t require immediate consistency.