Blog

Why you should use GraphQL

GraphQL was created in 2012 and open-sourced by Facebook in 2015 to relieve issues with the interfaces of the time, particularly for mobile devices with limited or flaky internet connections. Now managed by a foundation, GraphQL continues to provide an interface that allows making a single call that aggregates statically typed data from multiple resources without returning excess data, has provisions for the evolution of the schema without breaking clients, and has a large and healthy ecosystem of tools and libraries to use it.

What is GraphQL?

GraphQL is a query language for your API, and a server-side runtime for executing queries using a type system you define for your data.

Introduction to GraphQL

In practice, GraphQL is a simple Domain-Specific Language (DSL) that facilitates HTTP-based API calls. GraphQL, in the same nomenclature, describes:

  1. The schema of an API described completely down to the scalar-valued leaf nodes.
    • This includes not only the types of objects, but also the queries, mutations, and subscriptions available.
  2. A query, or request for data, complete with any predicates and the requested return data structure tree, also described down to the leaf nodes.
    • Any leaf node not requested is not provided in the return values.
    • Both predicates and return data are type-checked against the schema for type safety.
    • Queries can include built-in GraphQL types to introspect the schema itself, making the API self-documenting and discoverable.
  3. A mutation, or request to change or create data, with the data provided (in JSON alongside the mutation) described (you guessed it) down to the leaf nodes.
    • A query is included as part of a mutation, to describe the expected return value.
    • Provided data is also type-checked against the schema.
  4. A subscription, or a query for changes to data and, like a query, is complete with optional predicates and return-value definition.
    • Subscriptions often use WebSockets, and may return multiple values at various times for as long as the connection is held open.

It should also be noted that GraphQL has some great open-source runtime implementations that can be used on both the client- and server-side.

One of the most powerful aspects of GraphQL is the ability for middleware libraries on the client-side (such as in a Web Worker) to also parse and modify the queries. This provides for powerful caching, query optimization, and API stitching (using multiple API endpoints transparently as one) capabilities, which when combined with a local database also provides offline capabilities.

How GraphQL Relates to Graph Databases

A simple, unidirectional property graph
Simplest GraphQL schema, with resulting unidirectional property graph shown.

Other than them sharing concepts with graph theory, GraphQL has no direct relationship to graph databases. GraphQL is agnostic to the source of the data backing the API being exposed, so it can be an SQL database, a NoSQL database, a graph database, another service, or any mix of those.

A GraphQL schema can be said to describe a unidirectional property graph, where:

  • The nodes in the graph are the GraphQL types
  • The edges of the graph are the GraphQL type properties that refer to other types
  • The node properties of the graph are are the GraphQL type properties that refer to scalar values such as String, ID, Int, Float, etc.

Using the example Seinfeld dataset from the Managing Complex Data with JSON in PostgreSQL, Part 3 blog post, we can model the schema like this:

And the results from a people query (the query is shown later in this post) could be visualized in graph form as follows:

A graph of Seinfeld related nodes and relationships

Of note is that this schema and diagram has the enemies list directly in Person, where in the database model enemies is part of the characteristics data structure. This is a typical situation where the GraphQL data model is designed to be for the client usage, not the database model, and the API handles the translation internally.

GraphQL is REST Evolved

GraphQL API's architecture
A GraphQL API has a single URI endpoint, and resources are queried with an included query document. Multiple resources can be queried at once, and partial resources can be requested.

GraphQL can be thought of as the next evolutionary step from REST. Both REST and GraphQL are HTTP-based and provide application programming interfaces (APIs). They both share the same authentication and authorization mechanisms, such as Javascript Web Tokens (JWT), cookies, etc.

REST API's architecture
A REST API has a URI endpoint per resource. To query multiple resources it requires multiple requests, and the full resource is returned with each request.

REST, or REpresentational State Transfer, is an architectural style that has a fair bit of convention and best practices developed around it. Beyond sharing a commonly-used underlying protocol (HTTP), the basic tenets of REST (as outlined here in Roy Fielding’s original dissertation) are all honored by GraphQL as well:

  • Client-Server – what has by now become a common best practice of separating the concerns of the client and the server.
    • In practice, this makes the API the semipermeable membrane between the front-end and the back-end, where the data exposed to the client of the API should be in the most convenient form for the client, which is not necessarily the best form for the back-end or for storing in the database. This is a good practice no matter what type of API you’re making.
  • Stateless – “each request from the client to server must contain all of the information necessary to understand the request[.]”
    • In practice, the API can be session-less but the auth mechanism may be session-based. JWT would be an example of honoring this statelessness in an API and is used equally and interchangeably by both API systems.
  • Cache – “the data within a response to a request be implicitly or explicitly labeled as cacheable or non-cacheable[.]”
    • REST primarily relies on the underlying HTTP cache control mechanisms. Other caching mechanisms can be explicitly added as part of the underlying data structure and conventions, but REST doesn’t provide any of that.
    • GraphQL provides a query language and data structure definitions, making object-level caching practical, and many GraphQL libraries go so far as to extend that to provide offline functionality.
  • Uniform Interface – further broken down into four interface constraints:
    • Identification of resources
      • In REST a resource is identified directly by the request URI (literally, Universal Resource Identifier). In practice, however, it’s more common for the resource being referenced to be a container of many smaller resources.
      • In GraphQL one or more resources are identified explicitly in the request query, where the URI represents the umbrella API service resource. This is an evolution of the concept of resource usage based on the realization that rarely do resources exist in isolation and that there’s efficiency in being able to manipulate multiple resources in concert.
    • Manipulation of resources through representations – this has mostly been provided by the conventional usage of JSON.
      • In REST, the request and the response format are, other than HTTP by convention, not specified. For the most part, this is handled with JSON and increasingly less commonly XML.
      • In GraphQL, the request has a DSL to describe (in the form of a “query”) the shape of the data in the request and what’s expected in the response. Beyond that query, all data is provided as JSON by convention, however other forms of data can be used as well, just like in REST.
    • Self-descriptive messages
      • In REST the messages are the HTTP request, with the noun (subject of the message) being the URI, the verb (overall type of action taken upon that subject) being the HTTP method (GET, PUT, DELETE, etc.), with the remainder of the message interpreted based on that context. Much of that interpretation is convention-based. For example, a PUT would conventionally be used to update a resource, POST would be used to create a new one, and GET would be used to retrieve one.
      • In GraphQL the message is explicit in the query provided as part of the request and explains plainly how to interpret the data provided in the HTTP request. For this reason, the only HTTP methods used by GraphQL are GET and POST, with GET used only for queries that can be described fully in the query string portion of the URI. Commonly POST is used exclusively.
    • Hypermedia as the engine of application state
      • Originally written when clicking on a link in a browser implied a GET request, and click on a “Submit” button implied a POST request, this one has a much less clearly defined meaning now that most of the API interaction in a browser is handled by JavaScript, and indeed a browser is now only a small subset of potential clients of an API.
      • A more modern interpretation of this would be: application state is visible and manipulated by inspecting and manipulating the resources of the application directly
  • Layered System – furthering the concept of a semipermeable membrane, with the API acting as a black-box, it’s possible for the API service to itself use other APIs. This enables practices such as micro-services.
    • GraphQL formalizes this due to the explicitly declared schema, allowing the federation of multiple GraphQL schemas into one exposed API.
    • GraphQL, being agnostic to the backing store, can also merge multiple REST APIs or a hybrid of REST APIs, GraphQL APIs, and databases into one unified exposed API.
    • To aid in this it’s best to model the schema based on how the clients use the data, regardless of how the data is stored or generated.

There’s an additional and optional constraint mentioned in the original dissertation: “Code-On-Demand.” For various reasons, most involving security, it’s fairly uncommon and considered bad practice to deliver interpretable code from a REST interface, and that applies to GraphQL as well.

How to Use a GraphQL API

GraphQL, like REST, uses HTTP to access and manipulate the data. Unlike REST, a single URI represents the entire API, and the only HTTP methods used are GET and POST. Queries that are small enough to fit in the query string portion of the URI can use GET; otherwise, POST is used. Many GraphQL libraries use POST exclusively, so we’ll focus on POST in this article.

A GraphQL API schema is introspectable, meaning that you can query the API for the schema with no prior knowledge of the schema.

In the CRUDS (Create, Read, Update, Delete, Search/Subscription) style:

  • Create, Read, and Search are handled by Queries
  • Update and Delete are handled by Mutations
  • Subscriptions are their own thing

Since queries, mutations, and subscriptions share the same DSL (Domain-Specific Language) as the schema and JSON is used to carry the query and the response there is only one simple nomenclature to learn.

Here’s an example query, which is the one used to generate the graph diagram shown earlier:

Ignore for the moment that this should have pagination structures and predicates to filter the results set.

In order to send that query to an API, make an HTTP POST request with URI set to the API endpoint and with the typical HTTP headers, including any needed for authorization. The payload would be JSON with a query key containing the query itself as a string. The rest of the JSON payload consists of the optional key variables, which contains a map of variable names and values (not used in this example), similarly to how placeholders are used in SQL queries.

One defining aspect of GraphQL queries is that they must define the shape of the requested data down to the scalar values. The means that it’s illegal to request an entire object as you would with an * in SQL. Instead, you must request exactly which values you want from an object, and if that object contains referenced objects you must request the values from those objects, etc. If you think of the data structure as a tree, then GraphQL requires that you only pick leaves, you cannot pick entire branches. This is intentional and allows the API to grow new branches and leaves without existing clients having trouble with the newly added data. It also gives the client the opportunity to only request the data it needs for any given request.

In the above example, you see that the peopleenemieswho refers to another Person object, meaning it could eventually contain a circular reference. In this case, we don’t have to worry about that since we state explicitly how many levels of data we want back. We include the id in the results so we could query a Person to get more — assuming we added such a query to the schema.

The response from the API will be an HTTP 200 response with a JSON payload in the exact shape specified. If there was an error in the query, such as a syntax error or a reference to something not in the schema, then the result will still be an HTTP 200 response with a JSON payload, with the error description. You would only get a non-200 response in the case of a legitimate endpoint or server issue such as an authorization error or a load balancer failure.

Type Description Features

The type system of GraphQL is used to describe both the requested data and the data being sent. It is key to the whole system and matches other typing systems fairly well. GraphQL mostly matches the type system of TypeScript, which is more complete for a full scripting language, as it needs to be. Here’s the quick list:

  • Object types: Containers of fields that have zero or more arguments and a single return type. Similar conceptually to an Object in JavaScript, it’s a map-type keyed structure.
    • Not to be confused with Input Object types (described below).
    • Each field has zero or more arguments and a single return type, making them all functions. The arguments are always provided by name which also makes them unordered.
    • Arguments can be declared as optional and can declare default values to be used when omitted.
    • To provide an empty list in declaration or usage the parentheses are omitted, making it appear as a simple key/value pair (like in JSON) instead of a function call.
    • The arguments must be input types, which include: Scalars, Enums, Input Objects, as well as Lists of those three. Note that arguments/input types cannot contain Unions, directly or indirectly.
    • The return type of a field has to be an output type, which includes all types except Input Objects and Lists of Input Objects.
    • Objects can claim to implement (or conform to) zero or more Interfaces (described below).
    • Objects can be members of zero or more Unions (also described below).
  • Scalar types: Int, Float, String, Boolean, and ID. These are the leaf nodes of the system. It is possible to extend the list of scalar types, such as adding a DateTime, but that requires the client to understand how to interpret the new scalar value ahead of time.
  • Enum/enumeration types, which are a special type of scalar that has a limited set of options.
    • Unlike enumeration types in other languages, the enumerated options do not have a secondary value, such as underlying integer value in C-like languages. In the JSON conveying the input or output of an enum the value will be a string containing the name of the chosen option.
  • Input Object types, which are distinct from Object types in that they are exclusively for use in arguments.
    • Input Object fields cannot have arguments but must have types that are input types (Scalars, Enums, Input Objects, and Lists of those three).
    • One important restriction is that input types cannot contain Unions directly or indirectly. This should be considered when designing a schema since relying heavily on Unions for the outputs will force asymmetry between the interfaces of queries and mutations.
  • Union types, where a union value will hold one of the listed Object types.
    • Unions can only be composed of Object types. This means that Unions do not contain Scalars, Input Objects types, other Unions, or Interfaces, among others.
    • To query a value from a Union you’ll need to use a fragment for the underlying types, even if all types in the union contain overlapping fields.
    • Note that in queries the special field __typename that’s on every Object type can be used to determine which type of a union is returned. As __typename is universal, it can be used outside of a fragment.
  • Interfaces, which are abstract output types that Object types can explicitly claim to conform to, or implement.
    • Interfaces allow for the overlapping fields of differing Object types to be treated as a common Object-like type in return values.
    • Object types that implement a specific Interface must still explicitly declare all of the fields of that Interface. In other words, implementing an Interface does not provide a shorthand for declaring the fields of the Interface in an Object.
    • Interfaces also act like Unions, where fields that are not part of the Interface can be queried from the specific implementing types with fragments. Of note is that Objects explicitly claim to be part of Interfaces at declaration, where Unions claim their member Objects and have no claims of common fields.
    • Interfaces are the closest to the object-oriented (OO) concept of inheritance that GraphQL provides, however it is distinct from OO inheritance in that Objects cannot extend, implement, or inherit from other Objects.
  • Lists, which is a simple container around any of the types above, making a value capable of containing zero or more of the specified type.
    • A list is declared by putting hard brackets [] around any type: [People] is a list of People.
  • Non-nullable, which indicates with a bang after any type name (!) that the value must not be null.
    • Any field or argument that is not explicitly declared as non-nullable can be null.
    • Arrays can have:
      • The contained type marked as non-null ([People!]), making [null] (or any null value in the array) invalid but null (the whole array is null) is still valid
      • The array itself to be marked as non-null ([People]!), making [null] valid but null is invalid
      • Both ([People!]!), making both [null] and null invalid
    • Interestingly, there’s no way to declare that an empty array is invalid. In all cases above [] is valid.
  • Descriptions (e.g., "In quotes") which are documentation strings.
    • Multiline descriptions are in triple-quotes (e.g., “”” … “””).
  • Directives (e.g., @depricated) are used as modifiers on the schema, for example, to mark a field or type as deprecated.
    • Directives in particular are used heavily by automation such as AWS Amplify to reduce boilerplate and set up infrastructure for interpretation and serving of the schema. In these cases, the schema as served by the API is modified to add or alter types, generally with the directives removed.

Query Features

Queries (including Mutations and Subscriptions) have a few notable features on top of the type system described above:

  • Aliases, where you can specify in the query the name to use in the response.
    • The value of this becomes clear when you see that the response is in JSON, where it’s invalid to have the same key twice in the same object. With an alias, you can specify the key used to return the value of a specific portion of a query.
    • This also comes up when using fragments (described next), where two different fragments would return the same key but with different types. This is considered invalid in GraphQL, and so an alias must be used to ensure that one key could not possibly have one of two different values.
  • Fragments, which provide a shorthand for common query fragments. Used to extract the values from Unions and the non-common values of Interface implementers, they specify a specific type to apply to and a subquery to use in that context.
    • Fragments are defined once, and used with the spread operator (...) in one or more query contexts. It’s an error to define a fragment and not use it.
    • Fragments are applied to a specific return type (Object type, Interface, or Union), and act as filters when the concrete type in the context it is used doesn’t match the type it’s applied to. It’s an error to use a fragment in a context that cannot possibly contain the applied type.
    • For example, a Fragment defined with fragment userFragment on User { /* etc. */ } when used in the return context of an Enum that could contain a User or a Process will only be used when the actual concrete value is a User. A second Fragment would be used to retrieve the values of a Process. And a fragment on a Pet would be an error in that context, as the enum doesn’t contain that type.
  • Inline Fragments, which are the same as Fragments but are declared and used all at once (“inline”).
  • Variables provide input capability similar to placeholders in SQL, allowing the separation of the query parameters outside of the query itself.
    • Variables are declared with a type and optional default value inside the query and the values are provided alongside the query in the variables key of the same JSON, using matching key names in the child objects.
    • Values in the variables are type-checked before execution.
    • Using variables, much like placeholder in SQL, also prevents tainted user input from being treated as part of the query itself.

Summary

Given the structure and details of GraphQL, it stands out as not only the next evolution of APIs connecting the front-end UI to the back-end database and micro-services but also a system to coordinate the type system of the UI with the internal data usage and storage. The process of making the schema is an opportunity to formally structure not just the data but also how it’s interacted with. This also makes GraphQL great for machine-to-machine interaction, such as for micro-services to interact with each other.

When adding GraphQL to an existing service, you’ll find usage patterns and data structures, and gain a mechanism to evolve the vestigial appendages out. When creating a new project, designing the GraphQL schema early in the process will give you a design spec for all of the teams to use.

Further reading / references

Categories: Blog

02 Jun, 2021