[The API Book] 25 Rules of Designing Program Interfaces [v3]

Sergey Konstantinov
33 min readSep 19, 2022

--

An important assertion at number 0:

0. Rules must not be applied unthinkingly

Rules are just simply formulated generalizations from one’s experience. They are not to be applied unconditionally, and they don’t make thinking redundant. Every rule has a rational reason to exist. If your situation doesn’t justify following the rule — then you shouldn’t do it.

For example, demanding a specification be consistent exists to help developers spare time on reading docs. If you need developers to read some entity’s doc, it is totally rational to make its signature deliberately inconsistent.

This idea applies to every concept listed below. If you get an unusable, bulky, unobvious API because you follow the rules, it’s a motive to revise the rules (or the API).

It is important to understand that you always can introduce concepts of your own. For example, some frameworks willfully reject paired set_entity / get_entity methods in a favor of a single entity() method, with an optional argument. The crucial part is being systematic in applying the concept. If it's rendered into life, you must apply it to every single API method, or at the very least elaborate a naming rule to discern such polymorphic methods from regular ones.

NB: this is the third revision of Chapter 11 of my free book on APIs. If you find it useful, I would appreciate if you rate it on Amazon.

Ensuring readability and consistency

The most important task for the API vendor is to make code written by third-party developers atop of the API easily readable and maintainable. Remember that the law of large numbers works against you: if some concept or a signature might be treated wrong, they will be inevitably treated wrong by a number of partners, and this number will be increasing with the API popularity growth.

1. Explicit is always better than implicit

Entity name must explicitly tell what it does and what side effects to expect while using it.

Bad:

// Cancels an order
GET /orders/cancellation

It’s quite a surprise that accessing the cancellation resource (what is it?) with the non-modifying GET method actually cancels an order.

Better:

// Cancels an order
POST /orders/cancel

Bad:

// Returns aggregated statistics
// since the beginning of time
GET /orders/statistics

Even if the operation is non-modifying but computationally expensive, you should explicitly indicate that, especially if clients got charged for computational resource usage. Even more so, default values must not be set in a manner leading to maximum resource consumption.

Better:

// Returns aggregated statistics
// for a specified period of time
POST /v1/orders/statistics/aggregate
{ "begin_date", "end_date" }

Try to design function signatures to be absolutely transparent about what the function does, what arguments it takes, and what’s the result. While reading a code working with your API, it must be easy to understand what it does without reading docs.

Two important implications:

1.1. If the operation is modifying, it must be obvious from the signature. In particular, there might be no modifying operations using the GET verb.

1.2. If your API’s nomenclature contains both synchronous and asynchronous operations, then (a)synchronicity must be apparent from signatures, or a naming convention must exist.

2. Specify which standards are used

Regretfully, humanity is unable to agree on the most trivial things, like which day starts the week, to say nothing about more sophisticated standards.

So always specify exactly which standard is applied. Exceptions are possible if you’re 100% sure that only one standard for this entity exists in the world, and every person on Earth is totally aware of it.

Bad: "date": "11/12/2020" — there are tons of date formatting standards; you can't even tell which number means the day number and which number means the month.

Better: "iso_date": "2020-11-12".

Bad: "duration": 5000 — five thousand of what?

Better:
"duration_ms": 5000
or
"duration": "5000ms"
or
"duration": {"unit": "ms", "value": 5000}.

One particular implication of this rule is that money sums must always be accompanied by a currency code.

It is also worth saying that in some areas the situation with standards is so spoiled that, whatever you do, someone got upset. A ‘classical’ example is geographical coordinates order (latitude-longitude vs longitude-latitude). Alas, the only working method of fighting frustration there is the ‘Serenity Notepad’ to be discussed in Section II.

3. Entities must have concrete names

Avoid single amoeba-like words, such as ‘get’, ‘apply’, ‘make’, etc.

Bad: user.get() — hard to guess what is actually returned.

Better: user.get_id().

4. Don’t spare the letters

In the 21st century, there’s no need to shorten entities’ names.

Bad: order.time() — unclear, what time is actually returned: order creation time, order preparation time, order waiting time?…

Better: order.get_estimated_delivery_time()

Bad:

// Returns a pointer to the first occurrence
// in str1 of any of the characters
// that are part of str2
strpbrk (str1, str2)

Possibly, an author of this API thought that the pbrk abbreviature would mean something to readers; clearly mistaken. Also, it's hard to tell from the signature which string (str1 or str2) stands for a character set.

Better:

str_search_for_characters(
str,
lookup_character_set
)

— though it’s highly disputable whether this function should exist at all; a feature-rich search function would be much more convenient. Also, shortening a string to an str bears no practical sense, regretfully being a routine in many subject areas.

NB: sometimes field names are shortened or even omitted (e.g., a heterogenous array is passed instead of a set of named fields) to lessen the amount of traffic. In most cases, this is absolutely meaningless as usually the data is compressed at the protocol level.

5. Naming implies typing

Field named recipe must be of a Recipe type. Field named recipe_id must contain a recipe identifier that we could find within the Recipe entity.

Same for primitive types. Arrays must be named in a plural form or as collective nouns, i.e. objects, children. If that's impossible, better add a prefix or a postfix to avoid doubt.

Bad: GET /news — unclear whether a specific news item is returned, or a list of them.

Better: GET /news-list.

Similarly, if a Boolean value is expected, entity naming must describe some qualitative state, i.e. is_ready, open_now.

Bad: "task.status": true
— statuses are not explicitly binary; also such API isn't extendable.

Better: "task.is_finished": true.

Specific platforms imply specific additions to this rule with regard to the first-class citizen types they provide. For example, JSON doesn’t have a Date object type, so the dates are to be passed as numbers or strings. In this case, it's convenient to mark dates somehow, for example, by adding _at or _date postfixes, i.e. created_at, occurred_at.

If an entity name is a polysemantic term itself, which could confuse developers, better add an extra prefix or postfix to avoid misunderstanding.

Bad:

// Returns a list of 
// coffee machine builtin functions
GET /coffee-machines/{id}/functions

Word ‘function’ is many-valued. It could mean built-in functions, but also ‘a piece of code’, or a state (machine is functioning).

Better: GET /v1/coffee-machines/{id}/builtin-functions-list

6. Matching entities must have matching names and behave alike

Bad: begin_transition / stop_transition
begin and stop terms don't match; developers will have to dig into the docs.

Better: either begin_transition / end_transition or start_transition / stop_transition.

Bad:

// Find the position of the first occurrence
// of a substring in a string
strpos(haystack, needle)
// Replace all occurrences
// of the search string
// with the replacement string
str_replace(needle, replace, haystack)

Several rules are violated:

  • inconsistent underscore using;
  • functionally close methods have different needle/haystack argument ordering;
  • the first function finds the first occurrence while the second one finds them all, and there is no way to deduce that fact out of the function signatures.

We’re leaving the exercise of making these signatures better to the reader.

7. Avoid double negations

Bad: "dont_call_me": false
— humans are bad at perceiving double negation; make mistakes.

Better: "prohibit_calling": true or "avoid_calling": true
— it's easier to read, though you shouldn't deceive yourself. Avoid semantical double negations, even if you've found a ‘negative’ word without a ‘negative’ prefix.

Also worth mentioning is that making mistakes in the de Morgan’s laws usage is even simpler. For example, if you have two flags:

GET /coffee-machines/{id}/stocks

{
"has_beans": true,
"has_cup": true
}

‘Coffee might be prepared’ condition would look like has_beans && has_cup — both flags must be true. However, if you provide the negations of both flags:

{
"beans_absence": false,
"cup_absence": false
}

— then developers will have to evaluate the flag !beans_absence && !cup_absence which is equivalent to !(beans_absence || cup_absence) conditions, and in this transition, people tend to make mistakes. Avoiding double negations helps little, and regretfully only general advice could be given: avoid the situations when developers have to evaluate such flags.

8. Avoid implicit type conversion

This advice is opposite to the previous one, ironically. When developing APIs you frequently need to add a new optional field with a non-empty default value. For example:

POST /v1/orders
{}

{ "contactless_delivery": true }

This new contactless_delivery option isn't required, but its default value is true. A question arises: how developers should discern explicit intention to abolish the option (false) from knowing not it exists (field isn't set). They have to write something like:

if (Type(
order.contactless_delivery
) == 'Boolean' &&
order.contactless_delivery == false) {

}

This practice makes the code more complicated, and it’s quite easy to make mistakes, which will effectively treat the field in an opposite manner. The same could happen if some special values (i.e. null or -1) to denote value absence are used.

NB: this observation is not valid if both the platform and the protocol unambiguously support special tokens to reset a field to its default value with zero abstraction overhead. However, full and consistent support of this functionality rarely sees implementation. Arguably, the only example of such an API among those being popular nowadays is SQL: the language has the NULL concept, and default field values functionality, and the support for operations like UPDATE … SET field = DEFAULT (in most dialects). Though working with the protocol is still complicated (for example, in many dialects there is no simple method of getting back those values reset by an UPDATE … DEFAULT query), SQL features working with defaults conveniently enough to use this functionality as is.

If the protocol does not support resetting to default values as a first-class citizen, the universal rule is to make all new Boolean flags false by default.

Better

POST /v1/orders
{}

{ "force_contact_delivery": false }

If a non-Boolean field with specially treated value absence is to be introduced, then introduce two fields.

Bad:

// Creates a user
POST /v1/users
{ … }

// Users are created with a monthly
// spending limit set by default
{
"spending_monthly_limit_usd": "100",

}
// To cancel the limit null value is used
PUT /v1/users/{id}
{
"spending_monthly_limit_usd": null,

}

Better

POST /v1/users
{
// true — user explicitly cancels
// monthly spending limit
// false — limit isn't canceled
// (default value)
"abolish_spending_limit": false,
// Non-required field
// Only present if the previous flag
// is set to false
"spending_monthly_limit_usd": "100",

}

NB: the contradiction with the previous rule lies in the necessity of introducing ‘negative’ flags (the ‘no limit’ flag), which we had to rename to abolish_spending_limit. Though it's a decent name for a negative flag, its semantics is still unobvious, and developers will have to read the docs. That's the way.

9. No results is a result

If a server processed a request correctly and no exceptional situation occurred — there must be no error. Regretfully, an antipattern is widespread — of throwing errors when zero results are found.

Bad

POST /v1/coffee-machines/search
{
"query": "lungo",
"location": <customer's location>
}
→ 404 Not Found
{
"localized_message":
"No one makes lungo nearby"
}

4xx statuses imply that a client made a mistake. But no mistakes were made by either a customer or a developer: a client cannot know whether the lungo is served in this location beforehand.

Better:

POST /v1/coffee-machines/search
{
"query": "lungo",
"location": <customer's location>
}
→ 200 OK
{
"results": []
}

This rule might be reduced to: if an array is the result of the operation, then the emptiness of that array is not a mistake, but a correct response. (Of course, if an empty array is acceptable semantically; an empty array of coordinates is a mistake for sure.)

10. Errors must be informative

While writing the code developers face problems, many of them quite trivial, like invalid parameter types or some boundary violations. The more convenient are the error responses your API return, the less is the amount of time developers waste struggling with it, and the more comfortable is working with the API.

Bad:

POST /v1/coffee-machines/search
{
"recipes": ["lngo"],
"position": {
"latitude": 110,
"longitude": 55
}
}
→ 400 Bad Request
{}

— of course, the mistakes (typo in the "lngo", wrong coordinates) are obvious. But the handler checks them anyway, so why not return readable descriptions?

Better:

{
"reason": "wrong_parameter_value",
"localized_message":
"Something is wrong.⮠
Contact the developer of the app."
"details": {
"checks_failed": [
{
"field": "recipe",
"error_type": "wrong_value",
"message":
"Unknown value: 'lngo'.⮠
Did you mean 'lungo'?"
},
{
"field": "position.latitude",
"error_type":
"constraint_violation",
"constraints": {
"min": -90,
"max": 90
},
"message":
"'position.latitude' value⮠
must fall within⮠
the [-90, 90] interval"
}
]
}
}

It is also a good practice to return all detectable errors at once to spare developers’ time.

11. Maintain a proper error sequence

First, always return unresolvable errors before the resolvable ones:

POST /v1/orders
{
"recipe": "lngo",
"offer"
}
→ 409 Conflict
{
"reason": "offer_expired"
}
// Request repeats
// with the renewed offer
POST /v1/orders
{
"recipe": "lngo",
"offer"
}
→ 400 Bad Request
{
"reason": "recipe_unknown"
}

— what was the point of renewing the offer if the order cannot be created anyway?

Second, maintain such a sequence of unresolvable errors which leads to a minimal amount of customers’ and developers’ irritation. In particular, this means returning the most significant errors first, solving which requires more effort.

Bad:

POST /v1/orders
{
"items": [{
"item_id": "123",
"price": "0.10"
}]
}

409 Conflict
{
"reason": "price_changed",
"details": [{
"item_id": "123",
"actual_price": "0.20"
}]
}
// Request repeats
// with an actual price
POST /v1/orders
{
"items": [{
"item_id": "123",
"price": "0.20"
}]
}

409 Conflict
{
"reason": "order_limit_exceeded",
"localized_message":
"Order limit exceeded"
}

— what was the point of showing the price changed dialog, if the user still can’t make an order, even if the price is right? When one of the concurrent orders has finished, and the user is able to commit another one, prices, item availability, and other order parameters will likely need another correction.

Third, draw a chart: which error resolution might lead to the emergence of another one. Otherwise, you might eventually return the same error several times, or worse, make a cycle of errors.

// Create an order
// with a paid delivery
POST /v1/orders
{
"items": 3,
"item_price": "3000.00"
"currency_code": "MNT",
"delivery_fee": "1000.00",
"total": "10000.00"
}
→ 409 Conflict
// Error: if the order sum
// is more than 9000 tögrögs,
// delivery must be free
{
"reason": "delivery_is_free"
}
// Create an order
// with a free delivery
POST /v1/orders
{
"items": 3,
"item_price": "3000.00"
"currency_code": "MNT",
"delivery_fee": "0.00",
"total": "9000.00"
}
→ 409 Conflict
// Error: minimal order sum
// is 10000 tögrögs
{
"reason": "below_minimal_sum",
"currency_code": "MNT",
"minimal_sum": "10000.00"
}

You may note that in this setup the error can’t be resolved in one step: this situation must be elaborated over, and either order calculation parameters must be changed (discounts should not be counted against the minimal order sum), or a special type of error must be introduced.

Developing machine-readable interfaces

In pursuit of the API clarity for humans, we frequently forget that it’s not developers themselves who interact with the endpoints, but the code they’ve written. Many concepts that work well with user interfaces, are badly suited for the program ones: specifically, developers can’t make decisions based on textual information, and they can’t ‘refresh’ the state in case of some confusing situation.

12. The system state must be observable by clients

Sometimes, program systems provide interfaces that do not expose to the clients all the data on what is now being executed on the user’s behalf, specifically — which operations are running and what their statuses are.

Bad:

// Creates an order and returns its id
POST /v1/orders
{ … }

{ "order_id" }
// Returns an order by its id
GET /v1/orders/{id}
// The order isn't confirmed
// and awaits checking
→ 404 Not Found

— though the operation looks to be executed successfully, the client must store the order id and recurrently check the GET /v1/orders/{id} state. This pattern is bad per se, but gets even worse when we consider two cases:

  • clients might lose the id, if system failure happened in between sending the request and getting the response, or if app data storage was damaged or cleansed;
  • customers can’t use another device; in fact, the knowledge of orders being created is bound to a specific user agent.

In both cases, customers might consider order creating failed, and make a duplicate order, with all the consequences to be blamed on you.

Better:

// Creates an order and returns it
POST /v1/orders
{ <order parameters> }

{
"order_id",
// The order is created in explicit
// «checking» status
"status": "checking",

}
// Returns an order by its id
GET /v1/orders/{id}

{ "order_id", "status" … }
// Returns all customer's orders
// in all statuses
GET /v1/users/{id}/orders

This rule is applicable to errors as well, especially client ones. If the error might be corrected, the related data must be machine-readable.

Bad: { "error": "email malformed" } — the only thing developers might do with this error is to show the message to the end user.

Better:

{
// Machine-readable status
"status": "validation_failed",
// An array; if there are several
// errors, the user might correct
// them all at once
"failed_checks": [
{
"field: "email",
"error_type": "malformed",
// Localized
// human-readable message
"message": "email malformed"
}
]
}

13. Specify lifespans of resources and caching policies

In modern systems, clients usually have their own state and almost universally cache results of requests — no matter, session-wise or long-term, every entity has some period of autonomous existence. So it’s highly desirable to make clarifications; it should be understandable how the data is supposed to be cached, if not from operation signatures, but at least from the documentation.

Let’s stress that we understand ‘cache’ in the extended sense: which variation of operation parameters (not just the request time, but other variables as well) should be considered close enough to some previous request to use the cached result?

Bad:

// Returns lungo price in cafes
// closest to the specified location
GET /price?recipe=lungo⮠
&longitude={longitude}⮠
&latitude={latitude}

{ "currency_code", "price" }

Two questions arise:

  • until when the price is valid?
  • in what vicinity of the location the price is valid?

Better: you may use standard protocol capabilities to denote cache options, like the Cache-Control header. If you need caching in both temporal and spatial dimensions, you should do something like that:

// Returns an offer: for what money sum
// our service commits to make a lungo
GET /price?recipe=lungo⮠
&longitude={longitude}⮠
&latitude={latitude}

{
"offer": {
"id",
"currency_code",
"price",
"conditions": {
// Until when the price is valid
"valid_until",
// What vicinity
// the price is valid within
// * city
// * geographical object
// * …
"valid_within"
}
}
}

14. Pagination, filtration, and cursors

Any endpoints returning data collections must be paginated. No exclusions exist.

Any paginated endpoint must provide an interface to iterate over all the data.

Bad:

// Returns a limited number of records
// sorted by creation date
// starting with a record with an index
// equals to `offset`
GET /v1/records?limit=10&offset=100

At the first glance, this is the most standard way of organizing the pagination in APIs. But let’s ask ourselves some questions.

  1. How clients could learn about new records being added at the beginning of the list? Obviously, a client could only retry the initial request (offset=0) and compare identifiers to those it already knows. But what if the number of new records exceeds the limit? Imagine the situation:
  • the client process records sequentially;
  • some problem occurred, and a batch of new records awaits processing;
  • the client requests new records (offset=0) but can't find any known records on the first page;
  • the client continues iterating over records page by page until it finds the last known identifier; all this time the order processing is idle;
  • the client might never start processing, being preoccupied with chaotic page requests to restore records sequence.

2. What happens if some record is deleted from the head of the list?
Easy: the client will miss one record and will never learn this.

3. What cache parameters to set for this endpoint?
None could be set: repeating the request with the same limit and offset parameters each time produces a new record set.

Better: in such unidirectional lists the pagination must use the key that implies the order. Like this:

// Returns a limited number of records
// sorted by creation date
// starting with a record with an identifier
// following the specified one
GET /v1/records⮠
?older_than={record_id}&limit=10
// Returns a limited number of records
// sorted by creation date
// starting with a record with an identifier
// preceding the specified one
GET /v1/records⮠
?newer_than={record_id}&limit=10

With the pagination organized like that, clients never bother about records being added or removed in the processed part of the list: they continue to iterate over the records, either getting new ones (using newer_than) or older ones (using older_than). If there is no record removal operation, clients may easily cache responses — the URL will always return the same record set.

Another way to organize such lists is by returning a cursor to be used instead of the record_id, making interfaces more versatile.

// Initial data request
POST /v1/records/list
{
// Some additional filtering options
"filter": {
"category": "some_category",
"created_date": {
"older_than": "2020-12-07"
}
}
}

{ "cursor" }
// Follow-up requests
GET /v1/records?cursor=<cursor value>
{ "records", "cursor" }

One advantage of this approach is the possibility to keep initial request parameters (i.e. the filter in our example) embedded into the cursor itself, thus not copying them in follow-up requests. It might be especially actual if the initial request prepares the full dataset, for example, moving it from the ‘cold’ storage to a ‘hot’ one (then the cursor might simply contain the encoded dataset id and the offset).

There are several approaches to implementing cursors (for example, making a single endpoint for initial and follow-up requests and returning the first data portion in the first response). As usual, the crucial part is maintaining consistency across all such endpoints.

NB: some sources discourage this approach because in this case user can’t see a list of all pages and can’t choose an arbitrary one. We should note here that:

  • such a case (pages list and page selection) exists if we deal with user interfaces; we could hardly imagine a program interface that needs to provide access to random data pages;
  • if we still talk about an API to some application, which has a ‘paging’ user control, then a proper approach would be to prepare ‘paging’ data on the server side, including generating links to pages;
  • cursor-based solutions don’t prohibit using the offset/limit parameters; nothing could prevent us from creating a dual interface, which might serve both GET /items?cursor=… and GET /items?offset=…&limit=… requests;
  • finally, if there is a necessity to provide access to arbitrary pages in the user interface, we should ask ourselves a question, which problem is being solved that way; probably, users use this functionality to find something: a specific element on the list, or the position they ended while working with the list last time; probably, we should provide more convenient controls to solve those tasks than accessing data pages by their indexes.

Bad:

// Returns a limited number of records
// sorted by a specified field
// in a specified order
// starting with a record with an index
// equals to `offset`
GET /records?sort_by=date_modified⮠
&sort_order=desc&limit=10&offset=100

Sorting by the date of modification usually means that data might be modified. In other words, some records might change after the first data chunk is returned, but before the next chunk is requested. Modified records will simply disappear from the listing because of moving to the first page. Clients will never get those records that were changed during the iteration process, even if the cursor-based scheme is implemented, and they never learn the sheer fact of such an omission. Also, this particular interface isn’t extendable as there is no way to add sorting by two or more fields.

Better: there is no general solution to this problem in this formulation. Listing records by modification time will always be unpredictably volatile, so we have to change the approach itself; we have two options.

Option one: fix the records ordering at the moment we’ve got the initial request, e.g. our server produces the entire list and stores it in the immutable form:

// Creates a view based on the parameters passed
POST /v1/record-views
{
sort_by: [{
"field": "date_modified",
"order": "desc"
}]
}

{ "id", "cursor" }
// Returns a portion of the view
GET /v1/record-views/{id}⮠
?cursor={cursor}

Since the produced view is immutable, access to it might be organized in any form, including a limit-offset scheme, cursors, Range header, etc. However, there is a downside: records modified after the view was generated will be misplaced or outdated.

Option two: guarantee a strict records order, for example, by introducing a concept of record change events:

POST /v1/records/modified/list
{
// Optional
"cursor"
}

{
"modified": [
{ "date", "record_id" }
],
"cursor"
}

This scheme’s downsides are the necessity to create separate indexed event storage, and the multiplication of data items, since for a single record many events might exist.

Ensuring technical quality of APIs

Fine APIs must not only solve developers’ and end users’ problems but also ensure the quality of the solution, e.g. do not contain logical and technical mistakes (and do not provoke developers to make them), save computational resources, and in general implement the best practices applicable to the subject area.

15. Keep the precision of fractional numbers intact

If the protocol allows, fractional numbers with fixed precision (like money sums) must be represented as a specially designed type like Decimal or its equivalent.

If there is no Decimal type in the protocol (for instance, JSON doesn’t have one), you should either use integers (e.g. apply a fixed multiplicator) or strings.

If conversion to a float number will certainly lead to losing the precision (let’s say if we translate ’20 minutes’ into hours as a decimal fraction), it’s better to either stick to a fully precise format (e.g. opt for 00:20 instead of 0.33333…) or to provide an SDK to work with this data, or as a last resort describe the rounding principles in the documentation.

16. All API operations must be idempotent

Let us remind the reader that idempotency is the following property: repeated calls to the same function with the same parameters won’t change the resource state. Since we’re discussing client-server interaction in the first place, repeating requests in case of network failure isn’t an exception, but a norm of life.

If the endpoint’s idempotency can’t be assured naturally, explicit idempotency parameters must be added, in a form of either a token or a resource version.

Bad:

// Creates an order
POST /orders

A second order will be produced if the request is repeated!

Better:

// Creates an order
POST /v1/orders
X-Idempotency-Token: <random string>

A client on its side must retain the X-Idempotency-Token in case of automated endpoint retrying. A server on its side must check whether an order created with this token exists.

An alternative:

// Creates order draft
POST /v1/orders/drafts

{ "draft_id" }
// Confirms the draft
PUT /v1/orders/drafts/{draft_id}
{ "confirmed": true }

Creating order drafts is a non-binding operation since it doesn’t entail any consequences, so it’s fine to create drafts without the idempotency token.

Confirming drafts is a naturally idempotent operation, with the draft_id being its idempotency key.

Also worth mentioning that adding idempotency tokens to naturally idempotent handlers isn’t meaningless either, since it allows to distinguish two situations:

  • a client didn’t get the response because of some network issues, and is now repeating the request;
  • a client made a mistake by posting conflicting requests.

Consider the following example: imagine there is a shared resource, characterized by a revision number, and a client tries updating it.

POST /resource/updates
{
"resource_revision": 123
"updates"
}

The server retrieves the actual resource revision and finds it to be 124. How to respond correctly? 409 Conflict might be returned, but then the client will be forced to understand the nature of the conflict and somehow resolve it, potentially confusing the user. It's also unwise to fragment the conflict-resolving algorithm, allowing each client to implement it independently.

The server may compare request bodies, assuming that identical updates values mean retrying, but this assumption might be dangerously wrong (for example if the resource is a counter of some kind, then repeating identical requests are routine).

Adding the idempotency token (either directly as a random string, or indirectly in a form of drafts) solves this problem.

POST /resource/updates
X-Idempotency-Token: <token>
{
"resource_revision": 123
"updates"
}
→ 201 Created

— the server found out that the same token was used in creating revision 124, which means the client is retrying the request.

Or:

POST /resource/updates
X-Idempotency-Token: <token>
{
"resource_revision": 123
"updates"
}
→ 409 Conflict

— the server found out that a different token was used in creating revision 124, which means an access conflict.

Furthermore, adding idempotency tokens not only resolves the issue but also makes advanced optimizations possible. If the server detects an access conflict, it could try to resolve it, ‘rebasing’ the update like modern version control systems do, and return a 200 OK instead of a 409 Conflict. This logic dramatically improves user experience, being fully backwards compatible, and helps to avoid conflict-resolving code fragmentation.

Also, be warned: clients are bad at implementing idempotency tokens. Two problems are common:

  • you can’t really expect that clients generate truly random tokens — they may share the same seed or simply use weak algorithms or entropy sources; therefore you must put constraints on token checking: token must be unique to a specific user and resource, not globally;
  • clients tend to misunderstand the concept and either generate new tokens each time they repeat the request (which deteriorates the UX, but otherwise healthy) or conversely use one token in several requests (not healthy at all and could lead to catastrophic disasters; another reason to implement the suggestion in the previous clause); writing detailed doc and/or client library is highly recommended.

17. Avoid non-atomic operations

There is a common problem with implementing the changes list approach: what to do if some changes were successfully applied, while others are not? The rule is simple: if you may ensure the atomicity (e.g. either apply all changes or none of them) — do it.

Bad:

// Returns a list of recipes
GET /v1/recipes

{
"recipes": [{
"id": "lungo",
"volume": "200ml"
}, {
"id": "latte",
"volume": "300ml"
}]
}
// Changes recipes' parameters
PATCH /v1/recipes
{
"changes": [{
"id": "lungo",
"volume": "300ml"
}, {
"id": "latte",
"volume": "-1ml"
}]
}
→ 400 Bad Request
// Re-reading the list
GET /v1/recipes

{
"recipes": [{
"id": "lungo",
// This value changed
"volume": "300ml"
}, {
"id": "latte",
// and this did not
"volume": "300ml"
}]
}

— there is no way how the client might learn that failed operation was actually partially applied. Even if there is an indication of this fact in the response, the client still cannot tell, whether the lungo volume changed because of the request, or if some other client changed it.

If you can’t guarantee the atomicity of an operation, you should elaborate in detail on how to deal with it. There must be a separate status for each individual change.

Better:

PATCH /v1/recipes
{
"changes": [{
"recipe_id": "lungo",
"volume": "300ml"
}, {
"recipe_id": "latte",
"volume": "-1ml"
}]
}
// You may actually return
// a ‘partial success’ status
// if the protocol allows it
→ 200 OK
{
"changes": [{
"change_id",
"occurred_at",
"recipe_id": "lungo",
"status": "success"
}, {
"change_id",
"occurred_at",
"recipe_id": "latte",
"status": "fail",
"error"
}]
}

Here:

  • the change_id field is a unique identifier of each atomic change;
  • the occurred_at field is a moment of time when the change was actually applied;
  • the error field contains the error data related to the specific change.

Might be of use:

  • introducing sequence_id parameters in the request to guarantee execution order and to align item order in response with the requested one;
  • expose a separate /changes-history endpoint for clients to get the history of applied changes even if the app crashed while getting a partial success response or there was a network timeout.

Non-atomic changes are undesirable because they erode the idempotency concept. Let’s take a look at the example:

PATCH /v1/recipes
{
"idempotency_token",
"changes": [{
"recipe_id": "lungo",
"volume": "300ml"
}, {
"recipe_id": "latte",
"volume": "400ml"
}]
}
→ 200 OK
{
"changes": [{

"status": "success"
}, {

"status": "fail",
"error": {
"reason":
"too_many_requests"
}
}]
}

Imagine the client failed to get a response because of a network error, and it repeats the request:

PATCH /v1/recipes
{
"idempotency_token",
"changes": [{
"recipe_id": "lungo",
"volume": "300ml"
}, {
"recipe_id": "latte",
"volume": "400ml"
}]
}
→ 200 OK
{
"changes": [{

"status": "success"
}, {

"status": "success",
}]
}

To the client, everything looks normal: changes were applied, and the last response got is always actual. But the resource state after the first request was inherently different from the resource state after the second one, which contradicts the very definition of ‘idempotency’.

It would be more correct if the server did nothing upon getting the second request with the same idempotency token, and returned the same status list breakdown. But it implies that storing these breakdowns must be implemented.

Just in case: nested operations must be idempotent themselves. If they are not, separate idempotency tokens must be generated for each nested operation.

18. Don’t invent security

If the author of this book was given a dollar each time he had to implement the additional security protocol invented by someone, he would already retire. The API developers’ passion for signing request parameters or introducing complex schemes of exchanging passwords for tokens is as obvious as meaningless.

First, almost all security-enhancing procedures for every kind of operation are already invented. There is no need to re-think them anew; just take the existing approach and implement it. No self-invented algorithm for request signature checking provides the same level of preventing Man-in-the-Middle attack as a TLS connection with mutual certificate pinning.

Second, it’s quite presumptuous (and dangerous) to assume you’re an expert in security. New attack vectors come every day, and being aware of all the actual threats is a full-day job. If you do something different during workdays, the security system designed by you will contain vulnerabilities that you have never heard about — for example, your password-checking algorithm might be susceptible to the timing attack, and your web-server, to the request splitting attack.

19. Explicitly declare technical restrictions

Every field in your API comes with restrictions: the maximum allowed text length, the size of attached documents, the allowed ranges for numeric values, etc. Often, describing those limits is neglected by API developers — either because they consider it obvious, or because they simply don’t know the boundaries themselves. This is of course an antipattern: not knowing what are the limits automatically implies that partners’ code might stop working at any moment because of the reasons they don’t control.

Therefore, first, declare the boundaries for every field in the API without any exceptions, and, second, generate proper machine-readable errors describing which exact boundary was violated should such a violation occur.

The same reasoning applies to quotas as well: partners must have access to the statistics on which part of the quota they have already used, and the errors in the case of exceeding quotas must be informative.

20. Count the amount of traffic

Nowadays the amount of traffic is rarely taken into account — the Internet connection is considered unlimited almost universally. However, it’s still not entirely unlimited: with some degree of carelessness, it’s always possible to design a system generating the amount of traffic that is uncomfortable even for modern networks.

There are three obvious reasons for inflating network traffic:

  • no data pagination provided;
  • no limits on the data fields set, or too large binary data (graphics, audio, video, etc.) is being transmitted;
  • clients query for the data too frequently or cache them too little.

If the first two problems are solved by applying pure technical measures (see the corresponding paragraphs), the third one is more of a logical kind: how to organize the client updates stream to find a balance between the responsiveness of the system and the resources spent to ensure it. Here are several recommendations:

  • do not rely too heavily on asynchronous interfaces;
    — on one side, they allow tackling many technical problems related to the API performance, which, in turn, allows for maintaining backwards compatibility: if some method is asynchronous from the very beginning, the latencies and the data consistency models might be easily tuned if needed;
    — from the other side, the number of requests clients generate becomes hardly predicable, as a client in order to retrieve a result needs to make some unpredictable number of attempts;
  • declare an explicit retry policy (for example, with the Retry-After header);
    — yes, some partners will ignore it as developers will get too lazy to implement it, but some will not (especially if you provide the SDKs as well);
  • if you expect a significant number of asynchronous operations in the API, allow developers to choose between the poll model (clients make repeated requests to an endpoint to check the asynchronous procedure status) and the push model (the server notifies clients of status changes, for example, via webhooks or server-push mechanics);
  • if some entity comprises both ‘lightweight’ data (let’s say, the name and the description of the recipe) and ‘heavy’ data (let’s say, the promo picture of the beverage which might easily be a hundred times larger than the text fields), it’s better to split endpoints and pass only a reference to the ‘heavy’ data (a link to the image, in our case) — this will allow at least setting different cache policies for different kinds of data.

As a useful exercise, try modeling the typical lifecycle of a partner’s app’s main functionality (for example, making a single order) to count the number of requests and the amount of traffic that it takes.

21. Avoid implicit partial updates

One of the most common API design antipatterns is an attempt to spare something on detailed state change descriptions.

Bad:

// Creates an order comprising
// two items
POST /v1/orders/
{
"delivery_address",
"items": [{
"recipe": "lungo",
}, {
"recipe": "latte",
"milk_type": "oats"
}]
}

{ "order_id" }
// Partially rewrites the order,
// updates the volume
// of the second item
PATCH /v1/orders/{id}
{
"items": [null, {
"volume": "800ml"
}]
}

{ /* updates accepted */ }

This signature is bad per se as it’s unreadable. What does null as the first array element mean — is it a deletion of an element or an indication that no actions are needed towards it? What happens with the fields that are not stated in the update operation body (delivery_address, milk_type) — will they be reset to defaults, or stay unchanged?

The nastiest part is that whatever option you choose, the number of problems will only multiply further. Let’s say we agreed that the {"items":[null, {…}]} statement means that the first element of the array is left untouched, e.g. no changes are needed. Then, how shall we encode its deletion? Invent one more ‘magical’ value meaning ‘remove it’? Similarly, if the fields that are not explicitly mentioned retain their value — how to reset them to defaults?

The simple solution is always rewriting the data entirely, e.g. to require passing the entire object, to replace the current state with it, and to return the full state as a result of the operation. This obvious solution is frequently rejected with the following reasoning:

  • increased requests sizes and therefore, the amount of traffic;
  • the necessity to detect which fields are changed (for instance, to generate proper state change events for subscribers);
  • the inability of organizing cooperative editing when two clients are editing different object properties simultaneously.

However, if we take a deeper look, all these disadvantages are actually imaginative:

  • the reasons for increasing the amount of traffic were described in the previous paragraphs, and serving extra fields is not one of them (and if it is, it’s rather a rationale to decompose the endpoint);
  • the concept of sending only those fields that changed is in fact about shifting the responsibility of change detection to clients;
    — it doesn’t make the task any easier, and also introduces the problem of client code fragmentation as several independent implementations of the change detection algorithm will occur;
    — furthermore, the existence of the client algorithm for finding the fields that changed doesn’t mean that the server might skip implementing it as client developers might make mistakes or simply spare the effort and always send all the fields;
  • finally, this naïve approach to organizing collaborative editing works only with transitive changes (e.g. if the final result does not depend on the order in which the operations were executed), and in our case, it’s already not true: deletion of the first element and editing the second element are non-transitive;
    — often, in addition to sparing traffic on requests, the same concept is applied to responses as well, e.g. no data is returned for modifying operations; thus two clients making simultaneous edits do not see one another’s changes.

Better: split the functionality. This also correlates well with the decomposition principle we’ve discussed in the previous chapter.

// Creates an order comprising
// two items
POST /v1/orders/
{
"parameters": {
"delivery_address"
}
"items": [{
"recipe": "lungo",
}, {
"recipe": "latte",
"milk_type": "oats"
}]
}

{
"order_id",
"created_at",
"parameters": {
"delivery_address"
}
"items": [
{ "item_id", "status"},
{ "item_id", "status"}
]
}
// Changes the order parameters
// that affect all items
PUT /v1/orders/{id}/parameters
{ "delivery_address" }

{ "delivery_address" }
// Partially updates one item,
// sets the volume of one of
// the beverages
PUT /v1/orders/{id}/items/{item_id}
{
// All the fields are passed,
// even if only one has changed
"recipe", "volume", "milk_type"
}

{ "recipe", "volume", "milk_type" }
// Deletes one order item
DELETE /v1/orders/{id}/items/{item_id}

Now to reset volume to its default value it's enough to omit it in the PUT /items/{item_id} request body. Also, the operations of deleting one item while simultaneously modifying another one are now transitive.

This approach also allows for separating non-mutable and calculated fields (in our case, created_at and status) from editable ones without creating ambiguous situations (what should happen if a client tries to change the created_at field?)

It is also possible to return full order objects from PUT endpoints instead of just the sub-resource that was overwritten (though it requires some naming convention).

NB: while decomposing endpoints, the idea of splitting them into mutable and non-mutable data often looks tempting. It makes possible to mark the latter as infinitely cacheable and never bother about pagination ordering and update format consistency. The plan looks solid on paper, but with the API expansion, it frequently happens that immutable fields eventually cease being immutable, and the entire concept not only stops working properly but even starts looking like a design flaw. We would rather recommend designating data as immutable in one of the two cases: (1) making them editable will really mean breaking backwards compatibility, or (2) the link to the resource (for example, an image) is served via the API as well, and you do possess the capability of making those links persistent (e.g. you might generate a new link to the image instead of rewriting the contents of the old one).

Even better: design a format for atomic changes.

POST /v1/order/changes
X-Idempotency-Token: <idempotency token>
{
"changes": [{
"type": "set",
"field": "delivery_address",
"value": <new value>
}, {
"type": "unset_item_field",
"item_id",
"field": "volume"
}],

}

This approach is much harder to implement, but it’s the only viable method to implement collaborative editing since it explicitly reflects what a user was actually doing with entity representation. With data exposed in such a format, you might actually implement offline editing, when user changes are accumulated and then sent at once, while the server automatically resolves conflicts by ‘rebasing’ the changes.

Ensuring API product quality

Apart from the technological limitations, any real API will soon face the imperfection of the surrounding reality. Of course, any one of us would prefer living in the world of pink unicorns, free of piles of legacy code, evil-doers, national conflicts, and competitors’ scheming. Fortunately or not, we live in the real world, and API vendors have to mind all those while developing the API.

22. Use globally unique identifiers

It’s considered a good form to use globally unique strings as entity identifiers, either semantic (i.e. “lungo” for beverage types) or random ones (i.e. UUID-4). It might turn out to be extremely useful if you need to merge data from several sources under a single identifier.

In general, we tend to advise using urn-like identifiers, e.g. urn:order:<uuid> (or just order:<uuid>). That helps a lot in dealing with legacy systems with different identifiers attached to the same entity. Namespaces in urns help to understand quickly which identifier is used and if there is a usage mistake.

One important implication: never use increasing numbers as external identifiers. Apart from the abovementioned reasons, it allows counting how many entities of each type there are in the system. Your competitors will be able to calculate a precise number of orders you have each day, for example.

NB: in this book, we often use short identifiers like “123” in code examples; that’s for the convenience of reading the book on small screens. Do not replicate this practice in a real-world API.

23. Stipulate future restrictions

With the API popularity growth, it will inevitably become necessary to introduce technical means of preventing illicit API usage, such as displaying captchas, setting honeypots, raising the ‘too many requests’ exceptions, installing anti-DDoS proxies, etc. All these things cannot be done if the corresponding errors and messages were not described in the docs from the very beginning.

You are not obliged to actually generate those exceptions, but you might stipulate this possibility in the terms of service. For example, you might describe the 429 Too Many Requests error or captcha redirect, but implement the functionality when it's actually needed.

It is extremely important to leave room for multi-factored authentication (such as TOTP, SMS, or 3D-secure-like technologies) if it’s possible to make payments through the API. In this case, it’s a must-have from the very beginning.

24. Don’t provide endpoints for mass downloading of sensitive data

If it’s possible to get through the API users’ personal data, bank card numbers, private messages, or any other kind of information, exposing of which might seriously harm users, partners, and/or you — there must be no methods of bulk getting the data, or at least there must be rate limiters, page size restrictions, and, ideally, multi-factored authentication in front of them.

Often, making such offloads on an ad-hoc basis, e.g. bypassing the API, is a reasonable practice.

25. Localization and internationalization

All endpoints must accept language parameters (for example, in a form of the Accept-Language header), even if they are not being used currently.

It is important to understand that the user’s language and the user’s jurisdiction are different things. Your API working cycle must always store the user’s location. It might be stated either explicitly (requests contain geographical coordinates) or implicitly (initial location-bound request initiates session creation which stores the location), but no correct localization is possible in absence of location data. In most cases reducing the location to just a country code is enough.

The thing is that lots of parameters potentially affecting data formats depend not on language, but on a user’s location. To name a few: number formatting (integer and fractional part delimiter, digit groups delimiter), date formatting, the first day of the week, keyboard layout, measurement units system (which might be non-decimal!), etc. In some situations, you need to store two locations: user residence location and user ‘viewport’. For example, if a US citizen is planning a European trip, it’s convenient to show prices in local currency, but measure distances in miles and feet.

Sometimes explicit location passing is not enough since there are lots of territorial conflicts in the world. How the API should behave when user coordinates lie within disputed regions is a legal matter, regretfully. The author of this book once had to implement a ‘state A territory according to state B official position’ concept.

Important: mark a difference between localization for end users and localization for developers. Take a look at the example in rule #12: the localized_message is meant for the user; the app should show it if there is no specific handler for this error exists in the code. This message must be written in the user's language and formatted according to the user's location. But the details.checks_failed[].message is meant to be read by developers examining the problem. So it must be written and formatted in a manner that suits developers best. In the software development world, it usually means ‘in English’.

Worth mentioning is that the localized_ prefix in the example is used to differentiate messages to users from messages to developers. A concept like that must be, of course, explicitly stated in your API docs.

And one more thing: all strings must be UTF-8, no exclusions.

--

--