Blog

Managing Complex Data with JSON in PostgreSQL, Part 3

In the first post of this series, we explained why you absolutely need to use jsonb columns in the PostgreSQL tables right now, and mildly chided you if you aren’t already.

In the second post of this series, we demonstrated how to use triggers and constraints to make one read/write table with JSON data and an automatically-maintained read-only table for high-speed joins and to attach the foreign-key constraints to. This provides performance and data integrity but at a speed penalty of insertion speed as well as some on-disk data duplication.

In this post, we show how to create a read/write view that exposes relational data as JSON data (mixed with other JSON data), with the view being a thin layer on top of two tables maintained by triggers. One table holds the data we had before, including some JSON data, but with all of the references missing. The other table is used as an associative table and houses only the references and related metadata, complete with foreign-key constraints in place.

Many-branched tree in a colorful sunset

Views, Revisited

In part 2 we briefly demonstrated a view that partially deconstructs JSON but noted that views are read-only without some additional work. Now we’re going to see what it will take to make a read/write view, with the view showing the data constructed as jsonb columns, but with the database backing the view across at least two tables.

The goals for this schema is:

  • All of the relationships are fully managed by the database itself using traditional foreign-key constraints.
  • The data is only in one table, not duplicated as in the previous form.
  • All views and tables can safely be read or written.
  • There is no additional speed penalty for insertion into the backing-store tables (beyond the normal penalty for constraint checking, index updating, etc.).
  • There is only a minor penalty for changing rows through the view due to the triggers deconstructing the JSON and calling the relevant INSERT, UPDATE, or DELETE statements.
  • There is a minor penalty for reads of the view equal to the SELECT statement that makes up the view with joins and functions to construct the jsonb columns on the fly.

Setup

In the previous article we had two tables: people and enemy_association. We’ll convert people into a view, keep the same data stored in enemy_association as before, and create a new table called people_raw. This new table will have the same definition as the people table had before, complete with a characteristics jsonb column that will contain the same JSON data as before except enemies. The new people view will gather the data for the enemies key from enemy_association and merge it into the characteristics column.

Let’s create the two backing tables:

Note that two inserts are required. In the enemy_association insert, we had to specify the column list in order for the enemy_association.id column to get default values. Alternatively, we specified the person.id values.

Now we want a people view that returns the enemies list in the characteristics column, so let’s make that:

And we can select from the view and get the data we’re after:

last_namecharacteristics
Davola

Making an Updatable View

We can update people now by manipulating people_raw and enemy_association. However, we’ll need to make the view “editable” so that we interact with the data by manipulating the JSON. We’ll do this by adding an INSTEAD OF trigger function on the view. Note that we sprinkled optional RAISE NOTICE lines in the code so the code flow can be tracked.

Here’s what the general format will look like, which is actually very similar to what was in Part 2:

For a view, we need an INSTEAD OF triger, where on a table we use a BEFORE or AFTER trigger. This mechanism is how we make the view handle INSERT, UPDATE, and DELETE like a table.

Implementing DELETE

Deleting from people will simply delete from people_raw and we can let the foreign-key constraints cascade (or not, as configured in the constraints) to handle the rest.

That wasn’t so bad.

Implementing INSERT

Before we do an insertion or an update, we need to check the enemies list is a valid array. We provide more checks, but we’ll keep it simple for now:

Now to do the insertion itself, and we’ll come back for the rest of the update processing. There are two tricky parts to watch out for:

  1. The insertion of the people_raw table has to happen before the enemies_list is added to, so we don’t violate the foreign keys briefly. (We could defer them for the rest of the session, but that’s messy and presumptive.)
  2. The id column might be omitted in the INSERT call, which on a normal table INSERT requests a DEFAULT value be used. (In this case DEFAULT is a newly generated UUID.) Because of this optional column, we need to have two INSERT calls, one with id and one without.

Implementing UPDATE

Like the update handling in Part 2, the update handling is the most complicated. This is all data manipulation, where we have to remove enemy_association entries that are removed, add entries that are added, and leave the rest alone.

The ARRAY[(enemy->>'id')::uuid] <@ enemy_ids part is using the array-contains-array operator <@ to test if the given enemy->>'id' (a text value cast to a UUID) is in the enemy_ids list. Since we only have a single value, we need to put it in an array with the ARRAY[...] syntax.

The full trigger code with all the parts above put together can be found in this GitHub Gist.

Testing It

To sum up the functionality we have created:

  • We can perform CRUD (Create/INSERT, Read/SELECT, Update, or Delete) on the tables people_raw and enemy_association directly, and the view will immediately reflect any changes.
    • Insertions support defaulted id column population, with the default defined in the table definition as a generated UUID.
    • References from enemy_association to people_raw are guaranteed by the database using foreign-key constraints.
    • Changes that would violate those constraints are either prevented or cascaded to remove those referencing rows, controlled with normal constraint management syntax in PostgreSQL.
    • Both tables are fully indexable to get the maximum performance out of PostgreSQL.
  • We can perform CRUD on the people view and it will manipulate the people_raw and enemy_association tables according to the characteristics->'enemies' column value, ensuring that written data will read back the same.
    • This allows both reading and writing a complete enemies relationship in a single query to the view.
  • An example of data-shape checking is provided by ensuring that characteristics->'enemies' is an array if it exists.

Let’s demonstrate a few simple manipulations:

Summary

On top of this foundation, we could add:

  • Add a BEFORE trigger on people_raw to ensure proper handling of an enemies list provided, either by blocking it or stripping it off and updating enemy_association. Blocking would be cleaner, as read and write would remain symmetric.
  • More data-shape checks could be added, such as ensuring that other characteristics follow the correct schema.
    • For example: create a third table of allowed keys and their acceptable values, and have the trigger read that table for dynamically controlled data-shape checks. That would allow adding to the JSON “schema” by manipulating a table, preventing the need to actually change the database schema.

Over the last three blog posts, we covered why using JSON in PostgreSQL makes sense and its mechanics. We also covered how to create a changelog with triggers and JSON, making views read/write with triggers, as well as a few other tips and tricks like using LATERAL joins.

For more help or information about this or any other aspect of your enterprise tech stack and practices, contact us for a free consultation.

Categories: Blog

25 May, 2021