Search This Blog

Tuesday, December 16, 2008

Unable to RIP, my foundations are shaky..

"Oh No! Not REST again. Can't you post something else? Don't you see that you are blogging about something that is only a blip in the CS Continuum which will soon be rendered immaterial?For heaven sake, there are more important things to write about!..."

That my friends, was my split SOAP personality coming forth for a few brief moments ;-)..Down boy, down! This ain't one of those bashing blogs, its got nothing to do with you, it's more of a house keeping blog. I need to get my thoughts in order. We shall have our little death match soon, I promise..By the way, this blog has more to do with HTTP than REST.

As I work with REST and HTTP, there are some fundamental concepts that I need to absorb and document for my reference or anyone else who might find it useful. For anyone getting into the REST web services, one book is a must read, RESTful Web Services by Richardson and Ruby.Most of this blog is summarizing and discussing what is written in the book in addition to my 2c.

Some Definitions:

Side Effects in Programming:
A side effect free operation is one which when invoked, should not result in the "hidden" and "unexpected" change of some other state. Maybe a better definition can be obtained from Wiki Pedia. From a broad case, calling a method add(int a, int b), should not blow up my system ;-) or take money out of someone's bank account and credit it to mine :-)))

Idempotency:

An operation is said to be idempotent, when multiple invocation of the same operation by the same or different consumers, results in the leaving the operation in the same state always. When working with Resources, quoting the Web Services book, "An operation on a resource is idempotent, if making one request is the same as making a series of identical requests. The second and subsequent requests leave the resource state in exactly the same state as the first request."

Http Methods that I am unclear about:

1. GET:

GET method is used to retrieve whatever information is available at a particular Request URI. The W3C documentation states "If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process. "

A GET request is issued to obtain information. It is not meant to change the any state on the server. "No Change" seems to be the buzz word. Safety is of the utmost importance. If one executes, "/orders/order/23", 0 or more times, it should be the same every time. Executing the request, should not result in the consumer concerned regarding the state of the resource. Now, "same" does not mean that the result obtained is the "same" across all the calls over time. For example, it is possible that during the 1st request made, the order with number 23 was non-existent, when the 2nd request was made, the order was returned (maybe a PUT occured that created the resource), when the third request was made the same order was returned but had different content (maybe due to a PUT that changed the resource).

What is the "idem potency" part here? Are we leaving the resource in the same state across multiple requests?

What about cases like "HIT-Counters", should these resources be a GET operation? The Web Services book seems to say that GET requests CAN have side effects and states that HIT counters are a candidate for GET operation. If we go back to the W3C definition of GET, it states that if the GET operation invokes a URI that is a data producing process, the data will be returned as part of the response. For example, on every GET request, we might be changing server state due to some logging occurring. In the case of HIT counters, or lets say a service that generates UUID's, via, GET "/uuidgenerator", state is being changed from request to request. Side effects are occurring, and majorly so. So what's all this stuff we stated above regarding GET's not changing server state that is visible and creating a substantial impact? Isn't a call to increment the HIT counter really a "Increment and get the next value?"

The authors of the Web Services book state "A client should never make a GET or HEAD request just for the side effects, and the side effects should never be so big that the client might wish it hadn't made the request". When a client invokes a GET on a hit counter, if the client does not expect it to have a side effect of incrementing the counter, what is the point in making the request? I understand if the call was a GET "/hitCounter/currentCount", then the request is not changing the count. However, auto incremented GETs is definitely a major side effect IMO. The same would apply to the UUID which would create a new one on every call. This BLOG has an interesting discussion on GET and idempotency.

A document from the W3C attempts to address when it is appropriate to use GET. Again, I do not feel the document finds me my answer.

I am rather torn as GET for a UUID seems so natural, "Get me a new UUID". I am also unsatisfied with explanations from the Web Services book. However, in light of all the above, my take on items like hit counters, uuid generators etc, are better handled via a "POST" operation. We are changing the state of the resource definitely and with "INTENT" of doing so from the Client's perspective, so is it not better better to use a POST and consume the response? Am I totally wrong. here...?

2. PUT:

A PUT operation is typically used to create or update a resource. From the W3C, "The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI"

What the above tells me is that use PUT to create a resource that can subsequently queried with a GET request to the created resource URI.

For example, a PUT to "/orders/order/23", when there is no resource at the URI, the PUT operation results in the creation of an Order with Id = 23. Invoking PUT to "/orders/order/23" when an order previously existed at the resource, equates to updating the order. After either of the above PUT calls, a call of GET "/order/order/23" will obtain a resource, i.e., the Order with Id=23.

From the above, if creating a new resource, I would only use PUT if I have all the information required to create/update the resource before the call is made such that a subsequent GET operation can be executed with the information I possessed to obtain the newly created Resource. I do not expect the Resource code present on the server at the URI to supply any additional data that then allow me to locate the resource based of the server provided data. What I create with, I should be able to GET.

PUT indicates a call where the client is in control of exactly where and how the Resource will be identified. Further more PUT is idempotent, multiple calls with the same information has the same result on the operation. So in the case of the order example, before creating the resource, I would ask a UUID service to obtain a unique identifier and then use that identifier to uniquely identify the resource I create. Note that in the case of the "Order" example, one is actually creating a "sub-resource" of the orders resource, so is PUT an acceptable operation to do the same? Yes, the book is a good read for details.

3. POST:

POST, IMO is one of the HTTP methods that is most flexible to use. The W3C documentation on POST states "The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line". Another interesting line from the W3C documentation is "The action performed by the POST method might not result in a resource that can be identified by a URI."

From the above, POST should be used when creating a "Sub-Resource" of a resource. For example, a line item under a particular order such as POST "/order/23/lineItems/" . In other words POST is used to create a Resource without priorly knowing exactly where the Resource will be available upon creation. After the call to POST, the newly created line item resource might be available at "/order/23/lineItems/lineItem/2" or at "/order/lineItems/lineItem/3" or at some other identifier. The client that made the call to POST the data has no way of knowing the URI where the Line Item can be obtained from, prior to performing the call.

POST, IMO, is a great candidate when creating a resource whose Id will be generated when the resource is created, for example, when creating a line item that is a sub resource of the order.

What about using using POST for NOT creating a Resource that can be identified by a URI? This is total room for RPC style of programming. A Resource can be used as a RPC style processor by interrogating the request and performing different operations based of the same. The Web Services books terms the same as Overloaded POST. Uses of POST for "myresource?method=save" or "/myresource?method=archive" are uses of POST that are more RPC oriented.

Using POST for something like, "/calculateCost", IMO is a valid use of POST where a request is submitted and the response provides the result. As mentioned before, I am of the opinion that POST is the better candidate for HIT Counters, Id Generators etc.

One problem that plagues URI's is the allowed length. Although the HTTP Standard does not define a limit on URI length, clients and servers do. I like the example from the Web Services book, a GET "/numbers/11111..........", represents a problem. However, performing a POST on a resource by specifying a "method" seems a reasonable way to overcome this problem, ex: POST "/numbers?method=GET" where the number "11111....", a very long number, is in the body of the POST.

In other words, is it OK to bend the rules of REST in cases where URI length is a concern? This seems to me an architectural and design compromise one needs to suffer when URI length breaks the underlying system. Overloaded POST needs to be used judiciously. If URI length is NOT a concern, I recommend not having to use overloaded POSTs at all.

What about algorithms or singular methods available at a resource? Is POST the right HTTP method for the same?

When to use Query Variables?

One finds cases where representing Resource via fully qualified paths sometimes feels rather verbose. Do I need to create a resource for every path possible? Scoping a Resource sometimes does not sound right, in some other cases, it is just painful when very deep :-).

The authors of the Web Services book prefer to avoid query variables where possible. Quoting the authors, "..including them (query vars) in the URI is a good way to make sure that URI gets ignored by tools like proxies, caches and web crawlers". The same are great arguments, but making Resource URI's of resources that will not necessarily be used from one call to another, is pretty steep. The authors especially acknowledge the value of Query variables when they apply to searches or what they generalize as "algorithms".

I totally agree with the authors regarding the appropriate use of Query variables when a search is involved. Rather than having an individual URI path for every possible criteria and sub-criteria, the use of query variables are more apt for the problem.

Consider the Yahoo API, http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=finances&format=pdf

Note the use of the "webSearch" at the end of the URI prior to the Query Variables that follow. It is my opinion that the above is a great example of how a URI for search should be developed.

So in the same light, "/orders/search?createDate=20081127&containsItem=XBox.." is a great use of Query Variables.

Conclusion:

I hope I have understood the "basics" properly. If not, as always, I would appreciate insight in the matter. I am back to re-reading the book. At the very best, I find that HTTP specifications are rather nebulous. My take on the methods and query variables:

  • Use PUT when you know that you can GET the Resource based of the information you have when you will be PUTting the Resource.
  • For algorithms such as HIT counters, use POST.
  • Use POST when creating a sub resource, especially when working with generated database identifer that will define your resource URI.
  • Query Variables are a great use when performing searches. In particular, let the resource where the search begins be denoted as such, i.e, ".../.../foo../bar../search?...". Or in other words, qualify till applicable before resorting to search.
  • Do not overload POST and make it an RPC call. SOAP personality, please relax..its not personal :-)

Finally,

  • "/orders" - POST to create a new order seems correct
  • "/orders/123" - PUT to update order 123 seems correct
  • "/orders/123" - DELETE seems correct to delete order 123
  • "/orders" - GET seems correct to get all orders.
  • "/orders/123" - GET seems correct to get order 123
Resources:

No comments: