Designing a Concise and Consistent GraphQL API

The web world moves quickly; every developer involved with this ecosystem is painfully aware of this fact. The state-of-the-art stack you start building your startup with today is considered ancient technology within the next 3 months. Even so, there are a few timeless (at least at Javascript-speed) gems produced in this world. GraphQL is a technology that has been around for about 2 years now and it integrates well with the React/Relay stack to provide a more complete ecosystem for a web developer to work in. While Facebook and others have done a fine job discussing the merits of GraphQL and why you would use it (read: I'll let them convince you it's a good idea), we're going to discuss how to harness its power in a very reproducible way.

Difficulties with GraphQL

GraphQL is a great piece of technology. However, as a developer you're left with few guidelines to get moving quickly. In and of itself, GraphQL is a very non-opinionated interchange format. More specifically, it describes a graph-based structure for accessing and modifying data. While data access may seem like a straightforward task when previewing GraphQL, mutations are vague and very under-spec'ed out of box. While in theory (and practice once you've wrapped your head around it) this is actually a good thing for maximum flexibility, it leaves many developers scratching their heads wondering what to do.

GraphQL gains much of its power from this broad flexibility. At the same time, this flexibility creates a lot of cognitive overhead on developers. To date, most-- if not all-- GraphQL API's I have seen are hand-written. This is a cumbersome task. As we move to client-side apps, the backend servers are becoming more like secure datastores which enforce business constraints on data. This was likely always the case in traditional web development as well, but the backend was often merged with UI and UI generation code; as a result, this distinction was blurred. The only other logic performed on these servers is now often private computations which should be concealed from the clients' code. Moreover, with hand-written endpoints, it's difficult to guarantee both API consistency and complete functionality from one endpoint to the next.

Ultimately, this creates an API ecosystem where each endpoint is different. Namely, understanding how to operate one endpoint within my own GraphQL application doesn't necessarily guarantee I can properly operate others (much less other GraphQL apps in general). This would then require copious amounts of documentation on each endpoint for any developer to get up and running effectively. This is more reminiscent of my hardware days-- scanning through datasheets of similar components-- than it is of developing highly generic and reusable software components.

Solving these GraphQL Difficulties

As luck would have it, we already have a solution to eliminating the boilerplate for hand-written endpoints. In 2015, we built Elide. Elide enables developers to model their data using JPA, communicate with arbitrary datastores, secure it using meaningful expressions, and write custom code where necessary. It's been a large-scale, multi-year effort, but it has proven very effective on our own products. In any case, this is a solution designed to solve all the problems that fallout from hand-written endpoints: API consistency, uniform functionality, proper security, developer boredom, etc. The only problem: it didn't support GraphQL (and is only now in the 4.0 beta).

Initially, when we went out to build Elide, rather than reinventing the wheel we sought a standardized web API solution. GraphQL hadn't yet gained any traction (and iirc, it wasn't even fully released). As a result, we found JSON API and decided to build Elide around supporting this technology. It was opinionated and generating the API was straightforward. While JSON API has a lot of strengths, we also found some issues with the opinionated stances in which it took (more on that in a later post). Now you may be thinking to yourself, "Wait a minute, why is this important? I thought this post was on GraphQL." Well, it turns out that working so intimately with JSON API, we used-- what we believe to be-- some of its best ideas to inform ourselves on how to generate uniform, consistent, and automatable GraphQL API's.

Now that we have discovered a solution for minimizing endpoint code, there was a new looming question. How do we generate a consistent GraphQL API? That is, for any endpoint, all you need to know is the data model. From then on, you will have a standard set of tools available to you. Of course you could extend any particular model with special logic as needed, but generally speaking, there would exist a fully supported, common toolset for all models.

Designing a GraphQL API

Our motivation for building out GraphQL support in Elide was to more easily adopt its client-side ecosystem. It will not replace JSON API, but instead live alongside it for users to choose which solution is best for their project. However, this imposes an additional set of constraints; subtle implementation details such as pagination must be compatible with the existing tooling (i.e. Apollo or Relay). But one problem at a time: while GraphQL query-schemes appear to be well worked out, we first need to solve the problem for having a consistent means of object mutation.

Consistent GraphQL API Mutation

Our first goal was to find a consistent way of specifying GraphQL mutations; basically, any time you wish to insert or update data. After several ideas, we came back to an approach inspired by JSON API and REST (though it is certainly not REST). In summary, we turn each JPA relationship and Elide rootable types into objects which take GraphQL arguments. The arguments are op, ids, and data. Without going into all the details (the current spec can show examples), this allows us to support the same operations in a decidable way across all exposed entities in our GraphQL API. A brief description of each parameter is below:

  • op. This parameter describes the operation to be performed. When unspecified, this defaults to a FETCH operation, but it can also take on values such as UPSERT, REMOVE, REPLACE, and DELETE.
  • ids. When provided, this list of id's is used to filter the collection on which the operation is being performed.
  • data. The data argument is used for UPSERT'ing and REPLACE'ing. It specifies the new input data from the user.

With these three arguments, a user can perform arbitrary data operations on their models.

An Apollo/Relay-Ready GraphQL API

Automating an API that is Apollo- and Relay-compatible adds a few more layers of complexity. In short, a model in our originally proposed scheme would look like:

The example above will fetch a book object from the system with id and id of 1. It will return its title and all of the names of its associated authors. Pretty straightforward, right?

Well, when accounting for important concepts like pagination and so forth, this will not work with Relay out of box. As a result, we adopted Relay's Cursor Connections for maximum compatibility. While the scheme is still what we have proposed, there are now 2 additional layers of indirection for each model containing metadata (namely, edges and node objects). These layers are used for additional metadata. See below:

As you can see, the layers of indirection make the format a little bit uglier, but the same overall concepts apply.


GraphQL has many great ideas and an incredible amount of flexibility. However, it's difficult to avoid writing many hand-written endpoints and maintaining consistency across all of them. However, Elide was invented to solve one of these problems and has recently implemented a consistent method for generating GraphQL API's. If you're looking for a quick, out-of-box solution, I recommend considering Elide for your next project (to get started, see the example standalone project). In any case, when you build your next GraphQL API, be sure to think through these problems. If you don't adopt our scheme directly, at least be aware of the problems it solves. Good luck out there!

Docker Compose: Creating a Dev Environment like Production

Integration Testing

Integration testing is an important part of any production system. Whether it is automated, performed by a QA team, or done by the developer him- or herself, it is essential that all the product bits are verified before revealing the result to the customer. Generally speaking, integration testing is simply running a set of tests against a production-like environment. That means you remove any testing mocks you may have in place and observe the actual interactions between the various services which make up your product.

Development Issues with Complex Systems

The larger your systems scale, the more difficult it is to test them all on the same box. This simple fact comes from increasing system complexity; as your user-base grows, you utilize more resources, and your software architecture begins to span multiple machines. For instance, if you're building a lot of microservices, then each of these needs to be stood up and configured properly to work on your local box. Now each developer needs to duplicate this work for his or her own local setup.

Eventually, maintaining your local "full-integration" development environment becomes unwieldy and possibly even more difficult to manage than production (due to dependency issues, etc.). What this ultimately means is that developers kill this environment entirely. They write their code and unit tests and then just ship it off to the build pipeline. After waiting a period of time, their code should show up in an "pre"-production environment for a full-scale integration test. At this point, they identify if there are any glaring bugs in their code or some other additional oversight.

The problem here is that this process is inefficient. Not only is this slow, but when you share an environment for testing, it is common that multiple changes are deployed at once. As a result, it can sometimes become unclear which change is causing issues. Now let me make myself clear: you should have an environment that integrates all the latest changes before they go to production. This environment will protect you from a plethora of additional production issues (i.e. logically incompatible changesets), however, it is not the best way for developers to test their code. Developers should have tested their code thoroughly before pushing it to the environment that is going to certify it and send it to production.

Docker Compose

Well, this sure seems like a predicament. I just mentioned that for large services it often becomes incredibly difficult to maintain a local version of your complete system. While this is true, we can significantly reduce the burden with docker compose. Docker compose is a tool for managing multi-container Docker applications. In short, you can define an arbitrary system composed of many containers. This tool provides a perfect foundation for us to reproduce a small-scale version of production that can be run entirely locally.

Using Docker compose should be trivial if you're already deploying your services using Docker containers. If not you should first create Docker images for all of your services; while this is labor intensive, you and your team members can reuse these images in the future.

Our Example

Now that we understand the problem and our tools for solving it, we will work through an example. Below is a diagram describing our scenario.

Network model. WordPress application server connects to Internet and intranet while MySQL DB only connects to intranet.

In summary, we have a basic WordPress setup with a few minor tweaks. Rather than hosting both MySQL and WordPress on the same box, we have separated the concerns. Our WordPress application server is accessible on the open internet and our internal network. Our MySQL server, on the other hand, lives on a separate box only accessible on our intranet to prohibit anyone from external requests directly to the database. This example illustrates how one may naturally expand their services. Similarly, you could generalize this concept to arbitrarily complex networks.

Assuming this is the network we want to model with docker compose, let's take a look at configuration file below.

Without delving too deeply into the configuration format, the most notable information in this configuration file is the virtual networking. We have two different networks-- external-net and backend-- which correspond to Internet and intranet in our diagram, respectively. These networks provide the separation of concern as we had designed above. However, more important than the implementation details is the concept which this represents. Namely, we can specify the images, settings, and networking configuration for our docker containers and reuse this file everywhere. Once this file has been built once, it can be shared with the entire team making local integration testing accessible again. With a little maintenance for the addition of new services, this file can become a more faithful representation of your production environment for developers.


We have briefly discussed a major impediment to local integration testing today; most notably, the growing complexity of our products with microservice architectures. However, since this architecture has many benefits, we need to revisit the way we enable developers to perform more comprehensive testing before pushing their changes into the production build pipeline. We have demonstrated a simple use case of using docker compose to perform this task. In creating a single, shareable representation of the production setup, we can keep developers moving forward and reduce the overall number of bugs merged into mainline code.