Blog
Build a Stateful System with Envers

Some applications require full tracking of system state.  For example, for auditing purposes, a financial system may need to know the order in which every system change occurred.

Full tracking of system state provides the following capabilities:

  • Historical Analysis: It is possible to understand exactly what a system looked like at any given time
  • Providing Snapshots: Third-party systems can create their own “snapshot” of a system by getting the system state.
  • Auditing: Determine which user made a given change to the system

This post will show how to use Envers to

  • implement a system that provides full tracking of system state
  • make it easy for external systems to create a snapshot of the stateful system

What is Envers?

Envers is a module of Hibernate ORM that provides auditing/versioning of entities that you have mapped to database tables.    For an entity that you would like to audit, Envers will create a separate revision for the initial creation of the entity, for each change to the entity, and optionally for the deletion of the entity.   These records are stored in an audit table corresponding to the main table for the entity.

Envers also tracks in a “revision” table every system transaction involving at least one audited entity.  Each record in an audit table has a column that references a record in the revision table.  In this manner, we know exactly which system revisions resulted in the changes to a particular entity.

Finally, you can configure Envers to store the types of entities changed during a given transaction.  If you have lots of different entity types that you are auditing, this makes it simpler to search for the entities that changed – you don’t have to check the audit table for every entity type to know what changed.

In addition to providing mechanisms for storing audit information, Envers also provides built-in mechanisms for searching for audit information.

Example Data Model

We’ll demonstrate how Envers works by using a data model that tracks checkout data for a library.  This is similar to the data model referenced in our earlier blog post: “Command Query Responsibility Separation with a Relational Database and SOLR”.

sample data model

In the data model above, we have tables corresponding to three types of entities.

  • BOOK stores all metadata associated with a book entity
  • PATRON stores all metadata associated with a library patron entity
  • CHECKOUT stores an individual checkout of a book by a patron

We are auditing two of these three entities – BOOK and CHECKOUT.

  • A record is added to BOOK_HISTORY every time a BOOK record is changed. This includes the addition of a new book to the catalog, an update to a book’s metadata, or the removal of a book from the catalog.
  • A record is added to CHECKOUT_HISTORY every time a CHECKOUT record is changed. This includes new checkouts, renewals, and book returns.

Notice that each HISTORY table has SYSTEM_REVISION_ID and REVISION_TYPE columns.  SYSTEM_REVISION_ID is a foreign key to the equivalent value in the SYSTEM_REVISION table.  REVISION_TYPE indicates whether the audit record was created because of an insertion, and update, or a deletion.

To facilitate auditing, Envers uses two additional tables.

  • SYSTEM_REVISION contains a record for every transaction in which at least one audited entity was changed. There will be a single SYSTEM_REVISION record regardless of how many entities were changed in the transaction.
  • REVISION_CHANGES maps a transaction to the types of entities that were created, updated, or deleted in that transaction. For example, if a patron checks out three books, a single REVISION_CHANGES record will be added indicating that at least one checkout entity audit record was created.

Finally, there is a separate SYSTEM_STATE table that tracks the actual system state.   We need this table, because we want to provide an API for other systems to retrieve the catalog system state.  If we just use Envers, there is a race condition between when SYSTEM_REVISION_IDs are assigned and when they are available to be read by other database transactions.  We’ll provide more details about this race condition later.

Envers Implementation Details

Basic Configuration

Envers gives you a lot of options for configuring how to audit your data. Envers documentation contains details for all of the various configuration options so we’ll just provide the configuration options that we used for our data model.  Most of these options are self-explanatory.

Property Value Notes
org.hibernate.envers.revision_type_field_name revision_type The name of the column that stores whether a change was an add, update, or delete.  This column is found in each HISTORY table.
org.hibernate.envers.revision_field_name system_revision_id The name of the column that stores the system revision number.  This column is found in every table populated by Envers.
org.hibernate.envers.audit_table_suffix _history
org.hibernate.envers.track_entities_changed_in_revision true By setting this property to true, we will make sure that the REVISION_CHANGES table is populated.
spring.jpa.properties.org.hibernate.envers.revision_listener com.tenmilesquare.CustomRevisionListener The default Envers revision listener only populates the SYSTEM_REVISION_ID, and TIMESTAMP columns.  We use a custom listener to populate additional data such as the user that made the change to the system.
org.hibernate.envers.store_data_at_delete true Without this, we would only have a history record created when an entity was added or updated.
org.hibernate.envers.revision_on_collection_change false We don’t want to create a revision of a parent entity every time any of its children change.  For example, if our Book entity had a Set of checkouts and in order to save the checkout that was a book return, we cascaded a save of a the parent book, we don’t want to create a BOOK_HISTORY record.  We only want to create a CHECKOUT_HISTORY record.

Specifying which Entities and Fields to Audit

We define which entities we want to audit using Envers annotations.  Above the class declaration of each entity that we want to audit, we simply add the annotation @Audited.  By default, a change to any non-collection field in that entity that is mapped via JPA will result in a new history record.  We can indicate that we don’t want to add a history record for changes to a certain field by adding the @NotAudited annotation on the field’s getter method.  Envers can also be configured programmatically to audit only when certain conditions are met, but we are not using that feature here.

The code below ensures that the Book entity is audited.  For brevity, we’re only showing three fields.
@Entity
@Table(name="BOOK")
@Audited
public class Book {
 
    private Long bookId;
    private String title;
    private Integer optlock;
 
    // Other fields
 
    @Id
    @Column(name = "book_id")
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    public Long getBookId() { return bookId; }
    public void setBookId(Long bookId) { this.bookId = bookId; }
 
    @Basic
    @Column(name = "title")
    public String getTitle() { return title; }
    public void setTitle(String title) { this.title = title; }
 
    @Version
    @Column(name = "optlock")
    public Integer getOptlock() { return optlock; }
    private void setOptlock(Integer optlock) { this.optlock = optlock; }
 
    // Other accessor methods, equals(), hashCode(), etc.
}

Defining which information is stored with each system transaction

We’ve configured our application to use a custom revision listener – com.tenmilesquare.CustomRevisionListener.  We need to do two things to support this configuration option. First, we need to create the entity that maps to the SYSTEM_REVISION table.  The Java code for this entity is below.
@Entity
@Table(name="SYSTEM_REVISION")
 
// Mark the class as the entity to be created
// whenever a new revision is generated
 
@RevisionEntity(CustomRevisionListener.class)
 
// The base class ensures that the REVISION_CHANGES
// table will be populated.  It also provides the
// SYSTEM_REVISION_ID and TIMESTAMP JPA mappings. 
// Be default, the sequence used to populate SYSTEM_REVISION_ID
// is named "HIBERNATE_SEQUENCE", so we have to create that
// sequence in our database.  If we wanted to use a different
// sequence name, we could have extended
// SequenceIdTrackingModifiedEntitiesRevisionEntity instead.
 
public class SystemRevision
    extends DefaultTrackingModifiedEntitiesRevisionEntity {
    @Column(name = "username")
    private String username;
    @Column(name = "user_action")
    private String userAction;
    public String getUsername() { return username; }
    public void setUsername(String username) { this.username = username; }
    public UserAction getUserAction() { return userAction;
public void setUserAction(UserAction userAction) { this.userAction = userAction; }
}

Second, we need to create the com.tenmilesquare.CustomRevisionListener class.   The Java code for this class is below.
public class CustomRevisionListener implements RevisionListener {
 
    @Override
    public void newRevision(Object o) {
        SystemRevision revision = (SystemRevision) o;
 
        // Our application will use Spring as an IoC container. 
        // Spring manages all of our services.  However,
        // CustomRevisionListener is instantiated directly by
        // Envers and so we need to create a utility class
        // "ApplicationContextLookup" in order to get access
        // to any Spring beans.
 
        final CurrentUserService userService =
            ApplicationContextLookup.getBean(CurrentUserService.class);
        revision.setUsername(userService.getCurrentUser().getUsername());
 
        // Envers uses the same thread that rest of the
        // Hibernate transaction uses. Since the newRevision()
        // method doesn’t take any parameters, we add the
        // user action to the UserAction ThreadLocal earlier
        // in the transaction and access that ThreadLocal here.
 
        revision.setUserAction(UserAction.getUserAction());
    }
}

Using Envers to Support a Snapshotting API

For our application, we want to allow client applications to create snapshots of what the catalog looked like at any given time.  To enable this, we’ll provide a web service with the following functionality:

  1. Given a particular system state ID and an entity type, get the next system state/revision in which an entity of that type was added, modified, or deleted
  2. Given a particular revision and an entity type, get all of the entities of that type that changed in the revision

We can use built-in Envers audit searching capabilities to retrieve much of the information returned by the web service.  However, a quick aside before we get into Envers searching…

The Race Condition

Imagine that a client has a snapshot of our system as of revision R0.  There is no reason for the client to retrieve the data for any revisions before R0 because it already has all of that data.  It only cares about getting data for revisions R1, R2, etc.

Due to the nature of READ COMMITTED databases, if we use Envers revisions to directly track system states, it is possible for the database to make revision R2 available for read before revision R1.  As a result, when a client asks for the next revision after revision R0, the API will respond with R2 and the client will never know about revision R1.  The client will not have a true snapshot of our system.  The timeline below shows how this could happen.

Race Condition

To resolve this issue, we use a separate SYSTEM_STATE table to track system state.  Periodically, we get any SYSTEM_REVISION_IDs from the SYSTEM_REVISION table that are not yet in the SYSTEM_STATE table and assign system states to those revisions.  We use a uniqueness constraint on the SYSTEM_STATE table to prevent a revision from being assigned to multiple states (in the case that multiple threads are looking for new revisions).

In this way, we are guaranteed that when a client asks for the state after S0, it will always get state S1.   This is true even if the number of the revision R2 corresponding to state S1 is greater than the number of the revision R1 corresponding to state S2.

We use Hibernate optimistic locking to ensure that any entity changed in revision R2 (committed first) is not also changed in revision R1 (committed second).

The timeline below shows how using a separate SYSTEM_STATE table resolves the race condition.

Race Condition Resolved

API Details

Now that we’ve resolved the race condition, we can see how Envers can help us support a system state API. Let’s say that a client wants to get all of the Checkouts that occurred in the system after state 1001.  The client wants to get 50 checkouts per API call, so if there are more than 50 Checkouts in the state after 1001, the client must make multiple calls.  The calls that the client makes and the responses from the system are listed below:

Request Response
1. Get the next state after 1001 that has Checkouts /state?since=1001&entity=checkout State 1003, Revision 265
2. Get the first 50 checkouts changed in revision 265 /checkout-hist?changed-in=265&offset=0&limit=50 First 50 checkouts
3. Get the next 50 checkouts change in revision 265 /checkout-hist?changed-in=265&offset=50&limit=50

Last 25 checkouts

API Call #1 – Getting the next system state

To get the next state after 1001 that has Checkouts we need to query based on system state.  Unfortunately, since Envers searching only works on audited entities and system state is not directly part of those entities, we can’t use Envers to get the information for this API call.  However, as long as we use Hibernate to map a SystemState entity to the SYSTEM_STATE table, it is trivial to use HQL or the Criteria API to get the information that we need.  Ultimately, the SQL we want is:
SELECT ss.system_state_id, ss.system_revision_id
FROM system_state ss
JOIN revision_changes rc ON ss.system_revision_id = rc.system_revision_id
WHERE ss.system_state_id > 1001
AND rc.entity_name = 'com.tenmilesquare.Checkout'
ORDER BY ss.system_state_id
LIMIT 1;

API Calls #2 and #3 – Getting entity history

We can use Envers’ search capabilities to get the entities changed in a particular revision.  In the method below, we use Envers’ AuditReader and AuditReaderFactory classes along with a Hibernate session to retrieve Checkout history data.
public List<Checkout> getCheckoutsForRevision(
    int revsionId, int offset, int limit, Session session) {
    AuditReader reader = AuditReaderFactory.get(session);
    List<Checkout> checkouts =
        reader.createQuery().
            forEntitiesModifiedAtRevision(Checkout.class, revisionId).
            setFirstResult(offset).setMaxResults(limit).getResultList();
    return checkouts;
}

Summary

In this blog post, we’ve shown how to use Envers to implement a system that provides full tracking of system state.  We leveraged Envers’ write capabilities to store auditing information and Envers’ search capabilities to retrieve history for clients of a system state API. While we briefly discussed many of Envers’ capabilities, there is a lot more that Envers can do.  Take a look yourself at http://hibernate.org/orm/envers.

Categories: Blog, Software Development, Technology Strategy, Uncategorized

Tags: , ,

Jay Gelman
15 Mar, 2016


Leave a Reply

Your email address will not be published. Required fields are marked *