I see this topic flourishing around, and while one was close to hitting an important nail, none are really teaching you the necessary knowledge you, as a developer, must understand in order to manufacture a proper REST API server.
Just like Model Features, this is something you won't find anywhere else. This is my recollection of REST knowledge that experience has granted me while architecting more than a few REST services so far.
Prologue
Yes, the architectural constraints you've seen are part of the REST definition and it is good to be aware of them. But all that theory is just that: Theory. Did you know that the definition of REST doesn't mention HTTP? REST could be developed as an RPC server if you wanted. It talks about hyperlinks and hypertext, but doesn't say "REST must be implemented using the HTTP communication protocol".
Recently I read an article that implied REST is a protocol just like SOAP. Nothing can be further from the truth. You could do REST with SOAP if you wanted. SOAP is always POST
'ed, so you would identify the action you want to take inside the SOAP message. But I digress. The point here is that REST and SOAP are completely unrelated things.
REST is NOT a protocol, no matter how much you see this "protocol" word attached to it. REST is just architectural constraints (or requisites).
Of the REST requisites, the following are of more relevance to you, the developer:
Stateless
Cacheable
So let's start with what really matters about REST to developers.
Resources
Wait, what? That's not in the prologue. That's right. It's not. This is mentioned almost casually by several blogs/articles and nobody pays attention. As it turns out, resources are key to the developer.
A resource is an entity available in your REST server. Examples of resources are:
The defined application users
Country information
Documents
Sales records
Pretty much, resources are the data the server serves.
REST indicates that every resource must be uniquely identified by a URI (Uniform Resource Identifier). Feel free to read about URI's all you want. Learn about the scheme, the authority and whatnot. Bottomline and long story short: URL's (Uniform Resource Locator) satisfy the definition of what a URI is.
I do not wish to look this up, but somewhere years ago I saw a purist REST article that stated that REST HTTP server URL's should be of the form http://api.example.com/?<URI>
, but because URL's satisfy the URI definition, it is OK to do it as 99.999% of people do.
So one of the first tasks for you, the developer, is to define the resources and assign unique URI's to each one of them. Like this:
Now when you combine the URI's with the various HTTP verbs (POST, PUT, DELETE, GET, PATCH) you obtain a REST-compliant HTTP server.
Creating More Elaborate URI's
Ok, the examples above are rather simple. Truth be told, real-world scenarios require more versatility, especially in the realm of master-detail data, where one resource can have resources of other types associated with it in a one-to-many relationship, for example.
By using the User
and Country
resources as a subject of study, let's say you can have many users associated with a single country. REST purists will tell you that there should be one and only one URI per resource. If you were to follow this strict point of view, you would have to rely on the query string to find out which users belong to a given country: http://api.example.com/users?country=http%3A%2F%2Fapi.example.com%2Fcountries%2F123
. That would be even using the full URI of the country of interest, URL-encoded of course. But that's too much. Let's use this one: http://api.example.com/users?country=123
.
That's not too bad, I guess. In practice, however, it's awkward to implement. Imagine having to program NodeJS Express routers based on query strings; or .Net controllers that share the same route and distinguishing by query strings. Madness.
So many people prefer to overthrow the one-and-only-one-URI-per-resource rule and create more meaningful alternate routes that implicitly provide the query.
To continue with the example, what do you think about this one? http://api.example.com/country/123/users
. Personally, I love it. I do all my REST like this.
I must warn you, though, that not all the awkwardness goes away, but it is a better choice in practical terms, which is the aspect of REST I want to teach with this article: Practicality.
Stateless Constraint
Nowadays, it is rather simple to fulfill this one: By using a JSON Web Token, your server is largely relieved from any need for per-user state, such as sessions. Still, in practice, it is sometimes unavoidable to have some form of state. While I have no authority to say "don't worry about it" and all will be fine and forgiven, I do that: I don't worry about it. If I must, then I must.
Cacheable Constraint
There isn't much to say about this. In this modern day and age, you may relay this task to your reverse proxy and forget about it. If you, however, lack a reverse proxy or any other piece that may satisfy this for you, then you'll probably have to do this yourself using whatever tools are available to you and make sense for your system architecture.
Having said that: Remember that web browsers have their own private cache, and if an HTTP response said it is cacheable, the web browser will do its best to respond with the cached version. All you have to do is set some headers in your HTTP responses and you're golden.
The entire cache topic is rather large, and I have never personally had to implement it in any way or form, except maybe setting the headers such as the Vary
header.
If you would like to read about caching, maybe you can start here.
HTTP Verbs, Response Codes And More Practical Wisdom
Ok, now comes the really interesting part for developers!
This is also something that isn't written in stone, and at the end of the day, you return what makes sense to the consumer of your resources. Still, it is not impossible to follow, so do your best.
HTTP GET
GET
is used to retrieve resources. These are the HTTP responses that you commonly see in RESTful implementations for GET
requests.
Example | HTTP Status Code | Notes |
api.example.com/users | 200 OK | At least one user is returned in the response. This queries for the entire collection. |
api.example.com/users | 204 NO CONTENT | The users collection is empty. |
api.example.com/users/123 | 200 OK | The user represented by the URI (http://api.example.com/users/123 ) exists and has been returned in the response. |
api.example.com/users/123 | 404 NOT FOUND | There is no user that matches the specified URI. |
api.example.com/users/123 | 410 GONE | A special case instead of 404. If the User resource can be soft-deleted, you may deny its GET operation with this status code. In practice, though, there's always a need for soft-deleted data somewhere, so think hard before using this. |
api.example.com/users/abc | 400 BAD REQUEST | Optional. Maybe users are only enumerated, so "abc" is invalid. But maybe users CAN be retrieved by username, so maybe a 400 doesn't apply? It all depends on the implementation. |
api.example.com/users/?active=true | 204 NO CONTENT | The search yielded no results (no active users found). Don't use 404 for this case because 404 is an error code, and there's nothing wrong with finding nothing every now and then. |
api.example.com/users/?active=true | 200 OK | At least one user satisfied the query condition and has been returned in the response. |
HTTP PUT
PUT
is used to add a new resource or to replace the existing resource with a new one. Does this sound strange to you? Let's rephrase: PUT
'ing a resource to an URL (URI) stores said resource as a new resource if the specified URL (URI) is not in use by any other resource. If it is in use, however, the PUT
'ed resource will take its place.
In Practical Terms
It is an UPSERT
operation. But wait a minute: How can a consumer of the REST HTTP server know which ID to use, since in practice, 99.9999% of the time the ID is given by an auto-incrementing numeric field in some relational database, and therefore cannot be known ahead of time? The plot thickens!
Ha!, no, not really. This just means that a resource that lives in a REST server can only be UPSERT
'ed if it contains an alternative key. An alternative key is any other piece of information about the resource that is unique amongst its peers. Examples would be the user's social or DNI number of a person, or a container's assigned serial number. Resources that can be identified by alternative keys can be PUT
'ed.
This means that we will ignore once more the one-URI-per-resource rule and allow URI's for UPSERT
'able resources using the alternative key: HTTP PUT http://api.example.com/users/webJose
with the request body containing the details. Whether user webJose
exists or not has no relevance to the outcome: A user whose username is webJose
will exist from now onwards (assuming all data validation checks pass).
Back to the typical HTTP status codes returned by PUT
.
Example | HTTP Status Code | Notes |
api.example.com/users/webJose | 200 OK | The resource existed and was updated. |
api.example.com/users/webJose | 201 CREATED | The resource did not exist and was created. The HTTP response will carry the Location header containing the new URI (http://api.example.com/users/23 ). |
api.example.com/users/webJose | 400 BAD REQUEST | Either the URI, the body payload or the request headers are incorrect. |
api.example.com/users/webJose | 409 CONFLICT | The data in the request conflicts or somehow contradicts what is expected. Typically used with timestamp verification (a. k. a. rowversion in SQL Server). |
HTTP POST
POST
inserts a new resource. The URI one specifies for this operation is the resource's parent URI (the collection URI).
Example | HTTP Status Code | Notes |
api.example.com/users | 201 CREATED | The resource was created. The HTTP response will carry the Location header containing the new URI (http://api.example.com/users/23 ). |
api.example.com/users | 400 BAD REQUEST | Either the URI, the body payload or the request headers are incorrect. |
POST
'ing is traditionally used to post a single resource, but REST doesn't really impose any requirements around this. The problem here to resolve as a developer is: How do you respond to a bulk request? HTTP headers are limited in size. Thinking that you can fit thousands of URI's in the HTTP Location
header is not realistic. It is therefore more than likely that you'll have to drop the 201 CREATED
HTTP response status code and use the 200 OK
status code and then transmit the new URI's in the response body.
Still, I don't like this because of something I personally do that has proven very helpful. I will talk about this at the end of the article.
HTTP PATCH
PATCH
is used to update a resource, and is probably the second simplest HTTP verb to understand. If you need to make changes to a resource, you send an HTTP PATCH
request using the resource's URL (URI) and the information that changes.
Practicality Of "that changes"
I hear you: There's always a catch, and the catch for PATCH
is those two highlighted words.
The theory states that patching does not require the full resource in the request. Patching should work by only receiving the pieces of the resource that change. This sounds nice but in practice is painful to implement.
For example, to implement this behavior in ASP.Net you will have to define the resource model twice: A model that represents the resource, and a model that is used to transmit patch information.
Call me crazy, or call me lazy, I don't care. I hate the idea of modeling a resource twice. Unless you are absolutely against the wall on this one, just require the entire resource, changes included and then validate the resource's data that is allowed to change. Then make sure your repository ignores values on fields that are not allowed to change.
For example, people forget that the resource's ID will be in two places: The URI and the body payload. What I do here is make sure the deserialized body payload ID matches the ID in the URI. If it is not the case, I return 400 BAD REQUEST, or I simply override and continue.
NOTE: This is super simple to achieve with Dapper in ASP.Net. With Entity Framework you'll have to first query for the resource, then apply the changes in the returned object for the properties that are allowed to change, and then save the changes. One more reason to hate EF: It costs you a round trip to the database just for it to learn what you already knew.
Ok, let's move to the typical HTTP responses table.
Example | HTTP Status Code | Notes |
api.example.com/users | 405 METHOD NOT ALLOWED | Collections are not PATCH 'able. |
api.example.com/users/123 | 200 OK | The resource has been updated. |
api.example.com/users/123 | 404 BAD REQUEST | Either the URI, the body payload or the request headers are incorrect. |
api.example.com/users/123 | 409 CONFLICT | The data in the request conflicts or somehow contradicts what is expected. Typically used with timestamp verification (a. k. a. rowversion in SQL Server). |
HTTP DELETE
DELETE
is used to delete a resource. Shocking, I know. This HTTP verb, while capable of carrying a body, it is largely unneeded. Right now I cannot remember a single instance where I needed to send body information during a deletion request. At most, you need to send the known timestamp (rowversion
) and this can be transmitted easily by using the query string.
Example | HTTP Status Code | Notes |
api.example.com/users | 200 OK | Deletes the entire users collection. |
api.example.com/users/123 | 200 OK | Deletes the single user associated with the URL (URI). |
api.example.com/users/webJose | 200 OK | Deletes the single user associated with the URL (URI). |
api.example.com/users/123 | 405 METHOD NOT ALLOWED | A user cannot be deleted, and this is something you as developer may enforce for resources where business rules forbid deletion. |
api.example.com/users/123 | 409 CONFLICT | The data in the request conflicts or somehow contradicts what is expected. Typically used with timestamp verification (a. k. a. rowversion in SQL Server). |
This HTTP verb may also be used for soft deletions. The consumer of your REST server doesn't have to know you soft-deleted. Neither REST nor the HTTP specification forces you to reveal this implementation aspect.
Some More HTTP Responses
On top of what has been specified so far, you may also make use of other HTTP status codes.
HTTP Status Code | Notes |
202 ACCEPTED | The HTTP request has been received and this response simply acknowledges this fact. Whether or not it succeeds is unknown. Typically used for fast-processing endpoints where the result is not immediately needed, such as a log-receiving microservice. The response may carry an identifier to later query for the status of the request. |
401 UNAUTHORIZED | The request did not carry any recognizable authentication information for the requested operation. An authentication method exists. |
403 FORBIDDEN | The request carries proper credentials but said credentials don't grant the necessary permissions to perform the requested operation, or a suitable authentication method does not exist. |
418 I'M A TEAPOT | An April Fool's joke that made it to the HTTP standard. Use it if you don't want to serve the request for whatever (superfluous or petty) reason. |
429 TOO MANY REQUESTS | Usually provided by throttling middleware and makes sure your HTTP server is not overwhelmed with too many HTTP requests. Requests that exceed the threshold receive this error. |
500 INTERNAL SERVER ERROR | Return this if an unhandled exception occurs during the processing of a request. |
503 SERVICE UNAVAILABLE | Especially useful in Microservices where the queried microservice has emptied its data store in response to a data replay request. While the data is being replayed by the Record Of Origin, the microservice returns 503 for all received requests until the data replay is finished. Do send meaningful explanations to the caller about the nature of the unavailability. |
What Was the Thing I Personally Do That Has Proven Very Helpful?
So while explaining the possibility of allowing bulk operations I said there's only one practical way to respond successfully: 200 OK
with the new resource URI's in the response body. I also said, however, that I don't like this.
I like to always return the updated version of a resource after data-altering operation (POST
, PUT
, PATCH
and DELETE
). I do this because there's always some information that gets updated that the requestor doesn't know about. Such as? Some examples are:
The ID of a newly created resource
The last modified date
The new resource's timestamp (
rowversion
)The last modified by field
This has proven useful because most likely there's a user interface behind the request that needs to refresh its view. Returning the updated resource saves one round trip to the API server.
Tips for Bulk Operations
There are two ways you can program bulk operations:
Synchronously, only returning a response after all resources have been processed.
Asynchronously, queueing the resource-altering tasks and responding with
202 ACCEPTED
.
The first one has nothing special: Simply process and return 200 OK
or whatever result is relevant.
For the second one, consider adding a unique operation identifier the requestor can later use to obtain the request's result.
Conclusion
REST is a very general, even abstract concept that must not be confused with a protocol or thought of as being the same as HTTP (the P stands for "protocol" in any case). I think the best resource to start reading about what REST is, is this one.
I'll finish by quoting the above resource:
Roy Fielding (the author of REST), in his dissertation, has nowhere mentioned any implementation direction – including any protocol preference or even HTTP.
If you want to read Roy's dissertation, go here.
That's it for today, happy coding!