Home » Software Development Resources » Proper Pagination with GraphQL

Proper Pagination with GraphQL

GraphQL has many advantages over REST interfaces, such as the ability to request specific fields, with relationships, and get only the data you asked for. (For a detailed explanation of why you should use GraphQL, see my other post on just that.) The popular and freely available tooling for the GraphQL domain specific language (DSL), such as type generation for TypeScript, is an added bonus. However, there are still a few things that are left as an exercise for the developer, such as pagination.

The format of a request in GraphQL, along with its parameters, allows you to specify the exact shape you want returned. This limits what fields (“columns,” but hierarchical) are returned, but it won’t limit the quantity or number of “rows” returned. For example, you can request the First Name, Last Name, ID, and Birthdate of People. You can also provide a parameter to be used as a search term, such as a date to filter the result by Birthdate. Now if there are many records to return for this query (e.g.. there are millions of people in the database that match the search) you may get back millions of records, or just the first n, or it may simply return a “too large of results” error. To handle this situation there must also be a means of saying what portion (or page) of the results you want. And remember, GraphQL results are not flat rows-and-columns, but deep data structures, where you may request one item (e.g. Person) but still get back a large data set (e.g. what songs they’ve streamed).

This is why pagination is important. This has been covered in a few places, including on GraphQL.org‘s post on Pagination. Further, it builds on (and links to) a specification from the Relay folks: GraphQL Cursor Connections Specification. In both cases, they suggest cursor-based pagination. This is the approach we recommend as well. While it has its downsides (no inherent ability to jump to a specific page, for example), overall it is more efficient and usable for both the front and back ends.

However, even the specification leaves out a few non-trivial things. After implementing a few systems I’ve decided it is time to document those issues and offer guidance on how to implement them. We will also provide sample code that can be easily loaded with the 10mi2 projen module.

Overview of how Cursor-based pagination works

Here’s a greatly simplified version of how cursor-based pagination works:

  1. Make a normal GraphQL request with whatever parameters you need to specify what you’re looking for.
  2. The resolver on the server will make the request to its data source.
  3. The resolver will construct a “cursor” for each item returned that encapsulates the minimum identifying information for this item.
  4. The resolver places the cursor from the last item (or first, if paginating in reverse) in the pageInfo of the results.
  5. To get the next page, the client will use the cursor from the pageInfo of the previous request in the next request.
  6. Repeat until the pageInfo indicates that there are no more pages.

Supporting pagination changes the shape of the results. Instead of returning a simple list of User objects, for example, it instead returns a single UserConnection object that has pageInfo and edges fields, where edges is a list of UserEdge objects that contain the cursor for that User and (finally) the node that is the User object.

The upside of this structure is that instead altering the User object to convey the cursor and pagination information, that information is embedded in a very standard and mechanically easy to handle structure. If you don’t need the cursor for each edge, then don’t request it in your query.

Note: The cursor should be opaque – the customer shouldn’t be able to construct or modify one. Cursors should also be considered ephemeral unless there’s a good reason to assume they might be (or need to be) stable. For these reasons, we recommend that convert the cursors into strings that contain an encoding of the data (JSON is often easiest, but not terribly compact), and other than during debugging we further encode that data with Base64.

Issue 1: Sorting

The specification noted at the beginning leaves out what happens when you want to change sort order of the data being paginated. Once you start to implement that you soon see that that has a non-trivial impact on the schema and resolvers.

Let’s start with the easiest way to implement this: Simply add an enum sortBy parameter to the query. In your resolver you the change the query to the back-end to sort the values based on the new parameter. You now have sorting, but you’ve broken pagination.

For example, sorting User objects by lastName instead of id: When sorting by id, you were able to sort by id and limit to just items where id is greater (or lesser) than the cursor-provided value. When sorting by lastName you cannot still limit based on id, you now have to limit to those relative to lastName, and lastName is unlikely to be a unique column in your dataset, so you’ll need to sort by other columns and limit based on those same other columns. Let’s say we want the lastName sort to actually sort on last name (ascending), then first name (ascending). Even then, last + first names are not necessarily unique, so we need a tiebreaker, let’s add in the id column. So then it’s last name, first name, then id. Here’s where it gets tricky: If we sort “last name” in that manner, when we have a cursor we cannot simply say limit our result to lastName > cursor.lastName , since there may be two people with the same last name. We cannot simply use >= since the cursor may indicate a position in the middle of the group with that same last name.

The solution is to have a compound statement where we go element by element, where for each element the value is either greater than it or equal to it and greater than the next element, until we have covered the unique element, id in this case. So that would read like “where lastName > cursor.lastName OR (lastName = cursor.lastName AND firstName > cursor.firstName) OR (lastName = cursor.lastName AND firstName = cursor.firstName AND id > cursor.id)” for the three-element “last name” sorting.

Then we have an issue where the cursor for one sort order might not logically make sense for another. For example, the cursor for a page of users sorted by id starting with id of 100 won’t make sense to use as a page-start cursor when sorted by “last name.” The simplest solution to this is to put the sort order (explicitly or implicitly) into the cursors.

Note: The example cursors shown below are incomplete, and only show the parts necessary for the topic at hand. See the end of the article for complete cursor examples.

A cursor that implicitly contains the sort order may look like:

  • sort by id: [{“key”: “id”, “value”: “100”}]
  • sort by id descending: [{“key”: “id”, “value”: “100”, “order”: “d”}]
  • sort by “last name”: [{“key”: “lastName”, “value”: “Seinfeld”}, {“key”: “firstName”, “value”: “Jerome”}, {“key”:”id”, “value”:”100″}]

Of course, one could make it more compact (“k” instead of “key”, or use something other than JSON encoding), and as mentioned above the final values would be Base64 encoded. Using “key”: “lastName”, “value”: “blah” instead of “lastName”: “blah” makes it easier to write the validation code, and prevents name collisions like when going to sort by a key called “sort”.

A cursor that explicitly contains the sort order, and is more compact, would look like:

  • sort by id: {“user”: “100”, “sort”: “id”}
  • sort by id descending: {“user”: “100”, “sort”: “id”, “order”: “d”}
  • sort by “last name”: {“user”: “100”, “sort”: “lastName”}

Note that we only specify the user by id here, which means in order to use that cursor we either need to make a call to the database to get that user’s data, or, when using a relational database, we can add a join to our queries to compare the values to that user. (This last isn’t practical with many ORMs, like Prisma, where you’d have to make a query by hand, bypassing the point of the ORM.) So it’s more compact here, but has a performance impact. It is also less stable: if someone changes properties of the user at the end of the page you just loaded then you load then next page, you will get rather unpredictable results.

In other words: don’t use the object ID and sort order name as the cursor, unless the object at that ID is completely immutable.

Issue 2: Multi-level Pagination

In the case of hierarchical data, such as in the typical blogging example, you have Users who author Posts. When you request a particularly prolific user, it may return a very large list of posts. So naturally we’ll paginate the list of posts returned as part of the User object. Now say we’re scanning the list of users, which is also naturally paginated. This means we have paginated users which contain paginated posts. Luckily, GraphQL handles parameters anywhere within the requested structure, so we can make a query like so just fine:

Note that all of the parameters are optional, and some have defaults set, so you can make the first request without any knowledge. You could easily set postsFirst to a small value, like five, in order to only get the first five posts of each user. We have a few options for how to facilitate grabbing more posts from a specific user:

  1. A query in the schema for just getting a paginated list of posts, with a means of filtering to just the posts of a given user
  2. A query in the schema for just getting a single user and the associated posts (paginated)
  3. Reuse the same query, adding a means of “pinning” the results to just that user so we can paginate through that user’s posts (✅ this is the right answer)

There may be reasons outside of the scope of this discussion to make a query to just get the posts or just get a specific user, but in this case I suggest making it so the same query is still used. This makes less code to test on the client side and reduces the likelihood of the client to have to merge data from different top-level queries, which simplifies caching configuration as well. For example: if you have a users query that contains a paginated list of that user’s posts and a posts query to get paginated posts (optionally limited to a specific user), do not expect the client to get some of the posts from the users query and the rest from the posts query, or expect them to make use of the posts query at all. This adds confusion and violates the principals of why GraphQL exists in the first place.

In order to be able to get a second or later page of posts for a user, we need to “pin” the user being requested. To do this we add arguments to the outer query to limit the request so the response only contains that item, in many cases this would simply be an optional id argument. With that in mind, to get the second page of posts for a specific user, we would use this query, which is mostly the same but with the user pagination options replaced with id , and requesting less detail about the user (since we already have that):

In the first request we asked for the id of each user, giving it the alias of userId, which is what we’ll use for the value of id. Now we can paginate through the posts of that specific user, and we didn’t have to create or use another query.

Issue 3: Cursor-type Cross-pollination

One potential issue falls in the “user error” category, where we have to handle bad input. Since cursors are strings, it is easy to give one type of cursor in the context of another type, such as a User cursor for a page of Posts. We could make different Scalar types, but since they all are passed as strings from JSON,  it won’t make much difference.

The solution is to add a name key to each cursor type to distinguish cursors from different queries when they’re not compatible, such as a cursor for paginated pages when scrolling all pages vs paginated pages of a specific user, and of course that prevents you from giving a user cursor where a post cursor is expected.

In addition, cursors that can only be used in certain contexts need to contain info about that context. For example, since we can only use a cursor from a user’s pages list for that user, not only do we name it a userPost cursor, but we add context containing the user id so we can ensure we are looking at posts for the correct user.

Cursor Examples

There are multiple ways to achieve the requirements above, but in order to provide a concrete starting point, here are a few example cursor structures and how they would be interpreted. Note that these are shown in JSON with the Base64 encoding removed. JSON is not required, but these should be passed as strings and it is recommended to Base64 them or somehow make it clear that these are not to be edited or generated on the client side.

A user cursor where we are sorting by name, and should start a page with the first item whose name is Jerry and id is greater than 4, or has a name greater than Jerry, sorting the results by name  first then id:

{“name”:”user”,”value”:[{“key”:”name”,”value”:”Jerry”}, {“key”:”id”,”value”:4}]}

A post cursor where we are sorting by title, and should start a page with the first item whose title is Another way to sort posts and id is greater than 4, or has a title greater than Another way to sort posts, sorting the results by title  first then id:

{“name”:”post”,”value”:[{“key”:”title”,”value”:”Another way to sort posts”}, {“key”:”id”,”value”:4}]}

A userPost cursor from a post item of a users listing, where we are sorting by title, and should start a page of posts for user (the author of this post) with id of 22 with the first post whose title is Another way to sort posts and id is greater than 4, or has a title greater than Another way to sort posts, sorting the results by title  first then id:

{“name”:”userPost”,”value”:[{“key”:”title”,”value”:”Another way to sort posts”}, {“key”:”id”,”value”:4}],”context”:[“key”:”authorId”,”value”:”22″]}

About IDs

Just like we are converting our cursors into Base64-encoded JSON strings, we could use the same technique for IDs throughout the system, making them both universally unique and easily typed. For example, in our users and posts example, a user IDs unencoded value might be {“user”: 123} and a post IDs unencoded value may be {“post”: 123} – both have value 123 but we can see what’s being asked for. This can be used to verify that a client doesn’t accidentally use the wrong ID, but also provides a mechanism for use with Global Object Identification.

About Cursor Size

Cursors formatted like above can become somewhat large. Here are a few ideas for how to conserve space:

  • Encode them with a more efficient encoding than JSON – since they’re being Base64-encoded, binary formats like BSON are an option.
    • Downsides: This requires more dependencies, and the data is only shrinking by a relatively small amount.
  • Store the actual contents of the cursor in a database table, and return the unique ID of the row that contains the cursor
    • Downsides: More impact on the database and the table would need occasional cleanup as cursors expire.
    • Upsides: These can be easily cached in the server, as each one is effectively immutable, and it provides a mechanism for explicit expiration of cursors.
  • Compress the cursors
    • Brotli compression reduces the cursor size by about 17-18% for those between 100 and 150 bytes, but as the cursor size increases, the compression gets better.
    • Downside: This makes inspecting the encoded cursors more difficult for debugging for not much of an improvement in size.
    • Upside: The significantly discourages clients from attempting to generate or modify cursors.

Wrap Up

An example project can be easily generated using projen from the 10mi2 projen repo, or just the relevant sample code can be viewed there as well. If you need help implementing this approach in a larger project, contact us for a free consultation.

Scroll to Top