Search This Blog

Monday, March 2, 2009

An Example of Caching with REST using Jersey JAX-RS

One of the constraints/benefits of a RESTful architecture is the use of Cache's where possible. REST architecture gains from the use of cache's by reducing network bandwidth and unnecessary I/O. In short, caching of information has a direct impact on the scalability of the RESTful architecture.

Service Side Cache:
When a request is made to retrieve a data set, if the data set information does not change over a fixed duration as determined via non-functional requirements, then caching the data set on the Server side has the benefit of not having to suffer for example, database I/O for every client request. Network bandwidth utilization from Client to Server is of course suffered with a Server only cache.

Client Side Cache:
HTTP has a very cool construct in terms of information that a server can provide to a client saying cache or do not cache the payload provided by the server via HTTP Header attributes. In addition, client's can also utilize Conditional GET's to only obtain payload if the data has changed on the server. With Conditional GET one could also only return the changes that have occurred and a Client could easily compute the delta and update its local cached copy of the data.

I wonder how many organizations utilize these features provided by HTTP. The Server side cache can easily be accommodated via using Squid or some other tool.

On the Client side, now thats a bit of discussion. As part of HTTP response, a server can let the client know whether or not to Cache the provided representation via the "Expires" HTTP header attribute. As an example, when a Product Client or Web Browser, requests a list of Products from the server, if the server knows that the Product data will not be refreshed for some time, it can inform the client to cache the payload representing Products for the duration until expiration. What I like about this control is that the Server and not the Client is instructing the duration for which the data is valid and can be cached. Client's in a Client-Server environment that decide to cache data based of non-functional requirements is a bad way to cache IMO. The duration logic should IMO emanate from the service or server.

Using the "Expires" header the server can tell the client to cache the data till a particular date or provide a time duration to cache. The former can be a problem if the client and server clocks are not in sync. For example, the server tells the client to cache a piece of data till Dec 20th, 2012. However, the Client clock is 10 mins behind the server. So although the data on the client has not expired, the server data has. For this reason, setting a duration for expiration via a time duration such as 10 mins will allow both Client/Server to be in sync regarding expiration of the cache.

What about a case when caching is recommended on the client but there is a certain amount of volatility involved with the data. For example, lets say we have an OrderClient that GETs order information about a particular order from the server. The Order information could potentially be updated by a user subsequently, for example, adding a new line item to the Order. In such a case one could avail the Conditional GET features of HTTP to obtain new payload only if the data cached by the Client is stale. The server determines whether the data has changed between the last time the client requested the payload and either provides the entire data or responds back with a HTTP status of 304, indicating UN-Modified payload. The client in turn can in turn as a result of a 304 returned from the server, respond the consumer with the data it has previously cached. This reduces the amount of data transferred between client and server and thus alleviates network bandwidth utilization. Conditional HTTP GET can be availed using either Etags or Last-Modified header attributes.

As an example of the above, let us look at a Jersey, JAX-RS example. In the example, we have two clients, a ProductClient that obtains information about Products and an OrderClient used to manage the life cycle of an Order. The Product Client will cache the Products until the time has come to re-fetch the products due to expiration while the OrderClient will cache the payload obtained an issue a Conditional GET to only obtain the payload if the data has changed on the server since its last request.

The ProductsResource as shown below for the sake of demonstration, sets the Products to expire 3 seconds after its invocation:
@GET 
@Produces("application/json") 
public Response getProducts() {
   ...
   ProductListDto productListDto = new ProductListDto(productDtos);
   Response.ResponseBuilder response = Response.ok(productListDto).type(MediaType.APPLICATION_JSON);

   // Expires 3 seconds from now..this would be ideally based 
   // of some pre-determined non-functional requirement.
   Date expirationDate = new Date(System.currentTimeMillis() + 3000);
   response.expires(expirationDate);

   return response.build();
}

The OrderResource on the other hand based of an etag determines if the order has been modified since the last GET request by the client and returns back a status of 304 or the entire order body as shown below:

@GET
@Produces("application/xml")
public Response getOrder(@Context HttpHeaders hh, @Context Request request) throws OrderNotFoundException {
 Order order = orderService.getOrder(orderId);
 
LOG.debug("Checking if there an Etag and whether there is a change in the order...");

 EntityTag etag = computeEtagForOrder(order);
 Response.ResponseBuilder responseBuilder = request.evaluatePreconditions(etag);

 if (responseBuilder != null) {
     // Etag match
    LOG.debug("Order has not changed..returning unmodified response code");
    return responseBuilder.build();
 }
 
 LOG.debug("Returning full Order to the Client");
 OrderDto orderDto = (OrderDto) beanMapper.map(order, OrderDto.class);

 responseBuilder = Response.ok(orderDto).tag(etag);

 return responseBuilder.build();
}



From the Perspective of the ProductClient, it looks to see whether the cached data has expired before issuing a new request to the server as shown below:

public ProductListDto getProducts() {
  // Key into the cache
  String path = resource.getURI().getPath();
  CacheEntry entry = CacheManager.get(path);
  ProductListDto productList = null;
  if (entry != null) {
    LOG.debug("Product Entry in cache is not null...checking expiration date..");

    Date cacheTillDate = entry.getCacheTillDate();
    Date now = new Date();

    if (now.before(cacheTillDate)) {
      LOG.debug("Product List is not stale..using cached value");

      productList =  (ProductListDto) entry.getObject();
    } 
    else {
      LOG.debug("Product List is stale..will request server for new Product List..");
    }
   }

   if (productList == null) {
     LOG.debug("Fetching Product List from Service...");
     ClientResponse response = resource.accept(MediaType.APPLICATION_JSON).get(ClientResponse.class);

     if (response.getResponseStatus().equals(Status.OK)) {
       productList = response.getEntity(ProductListDto.class);
       String cacheDate = response.getMetadata().getFirst("Expires");
       
       if (cacheDate != null) {
         Date ccDate;

         try {
           ccDate = DATE_FORMAT.parse(cacheDate);
           entry = new CacheEntry(productList, null, ccDate);
           CacheManager.cache(path, entry);
         }
         catch (ParseException e) {
           LOG.error("Error Parsing returned cache date..no caching will occur", e);
         }
       }
     } 
     else {
       throw new RuntimeException("Error Getting Products....");
     }
  }
  return productList;
}

The Order Client on the other hand uses the etag and sends that as part of every request to the server as shown below:

public OrderDto getOrder(Long orderId) throws OrderNotFoundException, IOException {
  try {
    String path = resource.path(orderId.toString()).getURI().getPath();

    CacheEntry entry = CacheManager.get(path);
    Builder wr = resource.path(orderId.toString()).accept(MediaType.APPLICATION_XML);

    if (entry != null && entry.getEtag() != null) {
     // Set the etag
      wr.header("If-None-Match", entry.getEtag().getValue());
    }

    ClientResponse response = wr.get(ClientResponse.class);

    if (response.getResponseStatus().equals(Status.NOT_MODIFIED)) {
      LOG.debug("Order has not been modified..returning Cached Order...");
      return (OrderDto) entry.getObject();
    }
    else if (response.getResponseStatus().equals(Status.OK)) {
      LOG.debug("Obtained full Order from Service...Caching it..");
      OrderDto dto = response.getEntity(OrderDto.class);
      CacheManager.cache(path, new CacheEntry(dto, response.getEntityTag(), null));

      return dto;
    }
   else {
     LOG.debug("Order not found on server...removing from cache");
     CacheManager.remove(path);
     throw new UniformInterfaceException(response);
   }
  }
  catch (UniformInterfaceException e) {
    if (e.getResponse().getStatus() == Status.NOT_FOUND.getStatusCode()) {
      throw new OrderNotFoundException(e.getResponse().getEntity(String.class));
    }
    throw new RuntimeException(e);
  }
}

Seeing the above in action, for the Products; we obtain Products in the first call, this should result in caching of the same, the second request executed immediately after the first should obtain the cached Products. Sleeping for sometime will allow the data to become stale and a subsequent request should re-fetch the data. The logs when the tests are run look like:

1 Request, cache Products:
20:37:33 DEBUG - com.welflex.client.CacheManager.cache(14) | Caching Object with key [/IntegrationTest/products]

2. Request, Product Cache still good:
20:37:33 DEBUG - com.welflex.client.CacheManager.get(19) | Getting Object from Cache for Key:/IntegrationTest/products
20:37:33 DEBUG - com.welflex.client.ProductClientImpl.getProducts(49) | Product Entry in cache is not null...checking cache till date
20:37:33 DEBUG - com.welflex.client.ProductClientImpl.getProducts(54) | Product List is not stale..using cached value

3. Request, Products have expired:
20:37:43 DEBUG - com.welflex.client.CacheManager.get(19) | Getting Object from Cache for Key:/IntegrationTest/products
20:37:43 DEBUG - com.welflex.client.CacheManager.get(21) | Object in Cache for Key [/IntegrationTest/products] is :com.welflex.client.CacheEntry@1bf3d87
20:37:43 DEBUG - com.welflex.client.ProductClientImpl.getProducts(49) | Product Entry in cache is not null...checking cache till date
20:37:43 DEBUG - com.welflex.client.ProductClientImpl.getProducts(57) | Product List is stale..will request server for new Product List..
20:37:43 DEBUG - com.welflex.client.ProductClientImpl.getProducts(62) | Fetching Product List from Service...


From the Order Client Perspective, the first request to obtain the order results in the Order being cached with the etag. When a subsequent request is sent, the server only responds back with a status of 304, i.e, un-modified and then the Order Client responds back with the cached copy. After this second request, the order is updated and the etag is no longer valid therefore a subsequent GET of the order results in the full order being fetched and re-cached as shown below:

1. First time Order is retreived, Order is cached:
Retrieving the order...
22:33:13 DEBUG - com.welflex.client.CacheManager.get(19) | Getting Object from Cache for Key:/IntegrationTest/orders/3443274629940897628
22:33:13 DEBUG - com.welflex.client.OrderClientImpl.getOrder(68) | Obtained full Order from Service...Caching it..

2. Second Request, Order not changed on Server, 304 returned to Client:
22:33:13 DEBUG - com.welflex.order.rest.resource.OrderResource.getOrder(68) | Order Resource 22:33:13 DEBUG - com.welflex.order.rest.resource.OrderResource.getOrder(79) | Order has not changed..returning unmodified response code
22:33:13 DEBUG - com.welflex.client.OrderClientImpl.getOrder(64) | Order has not been modified..returning Cached Order...

3. Issue a PUT to update the Order, thus changing it:
Updating the order...
22:33:13 DEBUG - com.welflex.order.rest.resource.OrderResource.updateOrder(106) | Enter Update Order, Id=3443274629940897628

4. Retrieve the Order the etag should no longer be valid:
Retrieving the order..should not obtained cached copy...
22:33:13 DEBUG - com.welflex.order.rest.resource.OrderResource.getOrder(73) | Checking if there an Etag and whether there is a change in the order...
22:33:13 DEBUG - com.welflex.order.rest.resource.OrderResource.getOrder(83) | Returning full Order to the Client
22:33:13 DEBUG - com.welflex.client.OrderClientImpl.getOrder(68) | Obtained full Order from Service...Caching it..


Attached HERE is the sample Maven Jersey JAX-RS sample that will allow you to witness the above. The caching implemented is primitive at best and the attached code is only an EXAMPLE. One could easily delegate the caching to some caching framework such as ehcache, oscache, jcs etc. One can even potentially get more exotic and think of Aspects that will intercept calls to GET and transparently provide the caching.

To execute the example, from the command line, at the root of the project, execute a "mvn install". Note that one needs JDK5.X+ in order to execute the code.

Caching is a very critical feature of REST. Without using the same is like saying one is doing RES without the T. As always, if a reader of this blog has any comments, I'd appreciate the same. If I am wrong, I would like comments on the same as that will help me improve..or else forever hold ur breath :-) If you cannot run the example, ping me...

4 comments:

Domenico said...

Nice post, as others you made on the same argument. I've heard about cache capability as one of the plus of rest service so I was looking for an example of rest caching but hadn't not found anything, till now. I found it very instructive. I would like to investigate more on possible interactions between etag and @version jpa annotation for optimistic locking. Actually look like that using a timestamp with @version could provide a viable solution for cache management too. But what happens when the field used with @version is a long or a integer that is incremented at each revision by the entity manager? ETAG can be used with incremental version info?
Regards
domenico

Sanjay Acharya said...

It would be interesting to know how you would manage the optimistic locking using @version with cache expirations. One could use time stamp or just a version number when using optimistic locking. Looking forward to your findings...

Domenico said...

...Exactly.
Actually i'm using @version to annotate a Long property on my domain objects. In Rest services i use an helper method that build Etag from entity id and version in this way:


/**
* Create a EntityTag based on entity version and request Accept header. The media type parameter is here to
* avoid mediatype mismatch between the request and the cached value.
*
* @param mediaType
* @param dto
* @return Entitytag
*/
private static <D extends BaseDTO> EntityTag createEtag(MediaType mediaType, JAXBConverter<D> converter) {
StringBuilder sb = new StringBuilder(converter.getDTOs().size() * 40);
for (BaseDTO dto : converter.getDTOs()) {
sb.append(dto.getVersion()).append(";")
.append(dto.getId());
// remove version from dto
dto.setVersion(null);
}
EntityTag result = new EntityTag(sha1(String.format("%s;%s", sb.toString(), mediaType.toString())));
return result;
}


where sha1 is a method to compute a message digest and JAXBConverter<D> is an interface for a generic DTO wrapper that handle URI resource resolution and JAXB based xml/json marshalling/unmarshalling.

Anonymous said...

Nice article. But will it work on clustered environment? Not sure how the Etag will work in cluster. Do you have any thought on this?
Thanks & Regards
Kannan