Search This Blog

Tuesday, December 16, 2008

Unable to RIP, my foundations are shaky..

"Oh No! Not REST again. Can't you post something else? Don't you see that you are blogging about something that is only a blip in the CS Continuum which will soon be rendered immaterial?For heaven sake, there are more important things to write about!..."

That my friends, was my split SOAP personality coming forth for a few brief moments ;-)..Down boy, down! This ain't one of those bashing blogs, its got nothing to do with you, it's more of a house keeping blog. I need to get my thoughts in order. We shall have our little death match soon, I promise..By the way, this blog has more to do with HTTP than REST.

As I work with REST and HTTP, there are some fundamental concepts that I need to absorb and document for my reference or anyone else who might find it useful. For anyone getting into the REST web services, one book is a must read, RESTful Web Services by Richardson and Ruby.Most of this blog is summarizing and discussing what is written in the book in addition to my 2c.

Some Definitions:

Side Effects in Programming:
A side effect free operation is one which when invoked, should not result in the "hidden" and "unexpected" change of some other state. Maybe a better definition can be obtained from Wiki Pedia. From a broad case, calling a method add(int a, int b), should not blow up my system ;-) or take money out of someone's bank account and credit it to mine :-)))

Idempotency:

An operation is said to be idempotent, when multiple invocation of the same operation by the same or different consumers, results in the leaving the operation in the same state always. When working with Resources, quoting the Web Services book, "An operation on a resource is idempotent, if making one request is the same as making a series of identical requests. The second and subsequent requests leave the resource state in exactly the same state as the first request."

Http Methods that I am unclear about:

1. GET:

GET method is used to retrieve whatever information is available at a particular Request URI. The W3C documentation states "If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process. "

A GET request is issued to obtain information. It is not meant to change the any state on the server. "No Change" seems to be the buzz word. Safety is of the utmost importance. If one executes, "/orders/order/23", 0 or more times, it should be the same every time. Executing the request, should not result in the consumer concerned regarding the state of the resource. Now, "same" does not mean that the result obtained is the "same" across all the calls over time. For example, it is possible that during the 1st request made, the order with number 23 was non-existent, when the 2nd request was made, the order was returned (maybe a PUT occured that created the resource), when the third request was made the same order was returned but had different content (maybe due to a PUT that changed the resource).

What is the "idem potency" part here? Are we leaving the resource in the same state across multiple requests?

What about cases like "HIT-Counters", should these resources be a GET operation? The Web Services book seems to say that GET requests CAN have side effects and states that HIT counters are a candidate for GET operation. If we go back to the W3C definition of GET, it states that if the GET operation invokes a URI that is a data producing process, the data will be returned as part of the response. For example, on every GET request, we might be changing server state due to some logging occurring. In the case of HIT counters, or lets say a service that generates UUID's, via, GET "/uuidgenerator", state is being changed from request to request. Side effects are occurring, and majorly so. So what's all this stuff we stated above regarding GET's not changing server state that is visible and creating a substantial impact? Isn't a call to increment the HIT counter really a "Increment and get the next value?"

The authors of the Web Services book state "A client should never make a GET or HEAD request just for the side effects, and the side effects should never be so big that the client might wish it hadn't made the request". When a client invokes a GET on a hit counter, if the client does not expect it to have a side effect of incrementing the counter, what is the point in making the request? I understand if the call was a GET "/hitCounter/currentCount", then the request is not changing the count. However, auto incremented GETs is definitely a major side effect IMO. The same would apply to the UUID which would create a new one on every call. This BLOG has an interesting discussion on GET and idempotency.

A document from the W3C attempts to address when it is appropriate to use GET. Again, I do not feel the document finds me my answer.

I am rather torn as GET for a UUID seems so natural, "Get me a new UUID". I am also unsatisfied with explanations from the Web Services book. However, in light of all the above, my take on items like hit counters, uuid generators etc, are better handled via a "POST" operation. We are changing the state of the resource definitely and with "INTENT" of doing so from the Client's perspective, so is it not better better to use a POST and consume the response? Am I totally wrong. here...?

2. PUT:

A PUT operation is typically used to create or update a resource. From the W3C, "The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI"

What the above tells me is that use PUT to create a resource that can subsequently queried with a GET request to the created resource URI.

For example, a PUT to "/orders/order/23", when there is no resource at the URI, the PUT operation results in the creation of an Order with Id = 23. Invoking PUT to "/orders/order/23" when an order previously existed at the resource, equates to updating the order. After either of the above PUT calls, a call of GET "/order/order/23" will obtain a resource, i.e., the Order with Id=23.

From the above, if creating a new resource, I would only use PUT if I have all the information required to create/update the resource before the call is made such that a subsequent GET operation can be executed with the information I possessed to obtain the newly created Resource. I do not expect the Resource code present on the server at the URI to supply any additional data that then allow me to locate the resource based of the server provided data. What I create with, I should be able to GET.

PUT indicates a call where the client is in control of exactly where and how the Resource will be identified. Further more PUT is idempotent, multiple calls with the same information has the same result on the operation. So in the case of the order example, before creating the resource, I would ask a UUID service to obtain a unique identifier and then use that identifier to uniquely identify the resource I create. Note that in the case of the "Order" example, one is actually creating a "sub-resource" of the orders resource, so is PUT an acceptable operation to do the same? Yes, the book is a good read for details.

3. POST:

POST, IMO is one of the HTTP methods that is most flexible to use. The W3C documentation on POST states "The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line". Another interesting line from the W3C documentation is "The action performed by the POST method might not result in a resource that can be identified by a URI."

From the above, POST should be used when creating a "Sub-Resource" of a resource. For example, a line item under a particular order such as POST "/order/23/lineItems/" . In other words POST is used to create a Resource without priorly knowing exactly where the Resource will be available upon creation. After the call to POST, the newly created line item resource might be available at "/order/23/lineItems/lineItem/2" or at "/order/lineItems/lineItem/3" or at some other identifier. The client that made the call to POST the data has no way of knowing the URI where the Line Item can be obtained from, prior to performing the call.

POST, IMO, is a great candidate when creating a resource whose Id will be generated when the resource is created, for example, when creating a line item that is a sub resource of the order.

What about using using POST for NOT creating a Resource that can be identified by a URI? This is total room for RPC style of programming. A Resource can be used as a RPC style processor by interrogating the request and performing different operations based of the same. The Web Services books terms the same as Overloaded POST. Uses of POST for "myresource?method=save" or "/myresource?method=archive" are uses of POST that are more RPC oriented.

Using POST for something like, "/calculateCost", IMO is a valid use of POST where a request is submitted and the response provides the result. As mentioned before, I am of the opinion that POST is the better candidate for HIT Counters, Id Generators etc.

One problem that plagues URI's is the allowed length. Although the HTTP Standard does not define a limit on URI length, clients and servers do. I like the example from the Web Services book, a GET "/numbers/11111..........", represents a problem. However, performing a POST on a resource by specifying a "method" seems a reasonable way to overcome this problem, ex: POST "/numbers?method=GET" where the number "11111....", a very long number, is in the body of the POST.

In other words, is it OK to bend the rules of REST in cases where URI length is a concern? This seems to me an architectural and design compromise one needs to suffer when URI length breaks the underlying system. Overloaded POST needs to be used judiciously. If URI length is NOT a concern, I recommend not having to use overloaded POSTs at all.

What about algorithms or singular methods available at a resource? Is POST the right HTTP method for the same?

When to use Query Variables?

One finds cases where representing Resource via fully qualified paths sometimes feels rather verbose. Do I need to create a resource for every path possible? Scoping a Resource sometimes does not sound right, in some other cases, it is just painful when very deep :-).

The authors of the Web Services book prefer to avoid query variables where possible. Quoting the authors, "..including them (query vars) in the URI is a good way to make sure that URI gets ignored by tools like proxies, caches and web crawlers". The same are great arguments, but making Resource URI's of resources that will not necessarily be used from one call to another, is pretty steep. The authors especially acknowledge the value of Query variables when they apply to searches or what they generalize as "algorithms".

I totally agree with the authors regarding the appropriate use of Query variables when a search is involved. Rather than having an individual URI path for every possible criteria and sub-criteria, the use of query variables are more apt for the problem.

Consider the Yahoo API, http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=finances&format=pdf

Note the use of the "webSearch" at the end of the URI prior to the Query Variables that follow. It is my opinion that the above is a great example of how a URI for search should be developed.

So in the same light, "/orders/search?createDate=20081127&containsItem=XBox.." is a great use of Query Variables.

Conclusion:

I hope I have understood the "basics" properly. If not, as always, I would appreciate insight in the matter. I am back to re-reading the book. At the very best, I find that HTTP specifications are rather nebulous. My take on the methods and query variables:

  • Use PUT when you know that you can GET the Resource based of the information you have when you will be PUTting the Resource.
  • For algorithms such as HIT counters, use POST.
  • Use POST when creating a sub resource, especially when working with generated database identifer that will define your resource URI.
  • Query Variables are a great use when performing searches. In particular, let the resource where the search begins be denoted as such, i.e, ".../.../foo../bar../search?...". Or in other words, qualify till applicable before resorting to search.
  • Do not overload POST and make it an RPC call. SOAP personality, please relax..its not personal :-)

Finally,

  • "/orders" - POST to create a new order seems correct
  • "/orders/123" - PUT to update order 123 seems correct
  • "/orders/123" - DELETE seems correct to delete order 123
  • "/orders" - GET seems correct to get all orders.
  • "/orders/123" - GET seems correct to get order 123
Resources:

Tuesday, December 9, 2008

jvmti, jni - Absolute Power

Its a snowy day..I am sick, for those who jump to say "I've known that for years!", chill! :-) I am definitely under the weather and to add to the same, I have a snowy landscape to view. I am also suffering from really erratic sleep patterns. Sometimes I am unable to sleep until 5:45 a.m and other times where I wake up at 4:45 a.m. For one reason or another, I always land in state that has snow. Maybe it's destiny. Maybe one day when I move back to Bangalore, it will snow there as well, due to global warming or a nuclear winter, what have you. Immediate concerns, why could I not be in Florida? All ye Florida recruiters, ping me ;-) I apologize for the brief frustration release.

Anyway, that is enough for introductions. As always, I do not really know what to do with my precious time and instead choose to waste it. I have been interested in java class instrumentation and the esoteric world of java under the covers ever since I attended a presentation by Mr.Ted Neward. In particular, I have been wanting to play with JVMTI. Doing so meant, I would need to enter the world of C programming, something I have seem to have forgotten since working with java. So what am I trying to do? I am wishing, what if during developing unit tests, I could:
  • Force a Garbage collection of the JVM
  • Check to see if the objects I allocated are cleaned out
  • Determine what objects are on the heap
  • Determine what the state of the different threads on the VM are...
  • Determine what objects are reachable
  • And More...
Sure you can, just use a profiler like JProbe, YourKit or whatever. But how do they do it? JVMTI is the answer.

What is JVMTI? "The JVM tool interface (JVM TI) is a standard native API that allows for native libraries to capture events and control a Java Virtual Machine (JVM) for the Java platform"...That is the official statement, mine is "Power Baby, Power!". Read more about JVMTI, JVMPI and how agents work in this fantastic article by Kello O'Hair and Janice J.Heiss.

In particular, the folder JDK_HOME/demo/jvmti of your JDK has multiple demonstrations of JVMTI features. I spent quite sometime running the same and would recommend taking a look at the demo's for my fellow enthusiast.

So what am I looking for? What I would like to do is load a libary using JNI and use JVMTI to print debug information regarding my application state. In particular, I am looking to see whether or not my code cleans up after itself.

I have a class Foo that is rather plain and does the following. Note that the same could easily be replaced by a JUnit test:



public class Foo {
Bar b;

private static Bar BAR = new Bar();

public Foo() {
b = new Bar();
}

public static class Bar {}
public String sayHello() {

ProgramMonitor.dumpHeap();
return "Hello World";
}

public static void createFoo() {
ProgramMonitor.forceGC();

ProgramMonitor.dumpHeap();
new Foo().sayHello();

}

public static void main(String args[]) {

Foo.createFoo();
ProgramMonitor.forceGC();
ProgramMonitor.dumpHeap();

}
}



My code for the ProgramMonitor class is rather simple and uses JNI as shown below:



public class ProgramMonitor {
public static native int getNumberOfLoadedClasses();

public static native void dumpHeap();
public static native void forceGC();

static {
System.load(System.getProperty("jvmtilib"));

}
}


So what I am trying to accomplish. When an object of the "Foo" class is created, it results in the creation of the "Bar" class as well. In addition, when Foo.class is loaded, it creates a static reference to Bar as well. When the program is done with the "Foo" object that is instantiated, the Bar object should be gone, i.e., GCed. However, the static reference to Bar in the Foo class should still be available.

What if I could view this same happening and assert the same ?

As shown above, the ProgramMonitor.java invokes native methods. One can create a C header file from the class definition by executing the following command:

>javah -jni -classpath . ProgramMonitor

The above call results in the creation of a C header file called ProgramMonitor.h that looks like:

....
/*
* Class: ProgramMonitor
* Method: getNumberOfLoadedClasses
* Signature: ()I
*/

JNIEXPORT jint JNICALL Java_ProgramMonitor_getNumberOfLoadedClasses

(JNIEnv *, jclass);

/*
* Class: ProgramMonitor
* Method: dumpHeap
* Signature: ()V
*/

JNIEXPORT void JNICALL Java_ProgramMonitor_dumpHeap

(JNIEnv *, jclass);

/*
* Class: ProgramMonitor
* Method: forceGC
* Signature: ()V
*/

JNIEXPORT void JNICALL Java_ProgramMonitor_forceGC

(JNIEnv *, jclass);
.....



The above generated header file defines the JNI functions that one needs to implement. An C file that implements the JNI header functions is created, i.e., ProgramMonitor.c. Shown below are only some parts of the C file, ProgramMonitor.c:



...
#include "jni.h"
#include "jvmti.h"
#include "ProgramMonitor.h"

/* Check for JVMTI error */
#define CHECK_JVMTI_ERROR(err) \
checkJvmtiError(err, __FILE__, __LINE__)

static jvmtiEnv *jvmti;

.....
JNIEXPORT jint JNICALL JNI_OnLoad(JavaVM *vm, void *reserved) {

jint rc;
jvmtiError err;
jvmtiCapabilities capabilities;

jvmtiEventCallbacks callbacks;

/* Get JVMTI environment */
jvmti = NULL;

rc = (*vm)->GetEnv(vm, (void **)&jvmti, JVMTI_VERSION);

if (rc != JNI_OK) {
fprintf(stderr, "ERROR: Unable to create jvmtiEnv, GetEnv failed, error=%d\n", rc);

return -1;
}

CHECK_FOR_NULL(jvmti);

/* Get/Add JVMTI capabilities */
.....
/* Create the raw monitor */
err = (*jvmti)->CreateRawMonitor(jvmti, "agent lock", &(gdata->lock));

CHECK_JVMTI_ERROR(err);

/* Set callbacks and enable event notifications */
....
return JNI_VERSION_1_2;

}

JNIEXPORT jint JNICALL Java_ProgramMonitor_getNumberOfLoadedClasses(JNIEnv *env, jobject obj){

jclass *classes;
jint count;

(*jvmti)->GetLoadedClasses(jvmti, &count, &classes);


return count;
}

void dump() {

// Dump information....
.....
}

JNIEXPORT void JNICALL Java_ProgramMonitor_dumpHeap

(JNIEnv *env, jclass jclass) {
dump();

}

JNIEXPORT void JNICALL Java_ProgramMonitor_forceGC
(JNIEnv *env, jclass js) {

printf("Forcing GC...\n");
jvmtiError err = (*jvmti)->ForceGarbageCollection(jvmti);

CHECK_JVMTI_ERROR(err);
printf("Finished Forcing GC...\n");
}




The point to note from the above are that JNI_OnLoad method is called, a reference to the JVMTI environment is obtained and interest on jvmti capabilities are established. Note the forceGC call.

Now that we have the implementation of the library, we can build the same. The resulting library is called libProgramMonitor.so. So what we have now is a C library that obtains a handle to JVMTI and provides for methods to force garbage collection and provide information on the heap at any given time.

We are now ready to execute our Foo class and witness the output.



>java -Djvmtilib=/home/sacharya/jvmti-examples/libProgramMonitor.so -classpath . Foo
Forcing GC...
Finished Forcing GC...
Number of loaded classes 353
Heap View, Total of 35688 objects found.

Space Count Class Signature
---------- ---------- ----------------------
8 1 LFoo$Bar;
---------- ---------- ----------------------

Number of loaded classes 353
Heap View, Total of 35690 objects found.

Space Count Class Signature
---------- ---------- ----------------------
16 2 LFoo$Bar;
16 1 LFoo;
---------- ---------- ----------------------

Forcing GC...
Finished Forcing GC...
Number of loaded classes 353
Heap View, Total of 35679 objects found.

Space Count Class Signature
---------- ---------- ----------------------
8 1 LFoo$Bar;
---------- ---------- ----------------------



From the above, notice that the instance of Bar that was transiently created was reclaimed. The static reference to Bar however lingered as expected.

Conclusion:
We can easily add more methods to the ProgramMonitor class to provide information such as References, Threads etc. The ProgramMonitor library is not displaying all the loaded classes and is filtering out the ones that begin with "java" or "sun".

Using JVMTI can be so valuable in validating code and ensuring it behaves as expected at the Unit test level. I am aware that there are commercial software that do the same :-)...You can't blame me for playing ;-). JVMTI is powerful stuff and I am only feeling the temperature of the water here. I don't want to enter the "C" though ;-)! If Linux is for geeks, then so is C. I have reached the conclusion that Java is equivalent of Windows OS for the C programmer.

I am not quite sure whether the "ForceGarbageCollection" is indeed a gurantee of Garbage collection. I am curious regarding promotion of objects across different GC spaces and how the ratio effects the code.

Source:
The code shown above was developed on JDK1.16.X and run on a Linux OS. Easily made compatible though by looking at the examples in the standard jdk demo. In addition, the majority of the code is based of the heapViewer demo code. This example is only an "example".

As always, my source can be obtained from HERE

Running the Example:
Enusre you have JDK 1.6 installed and you are on Linux OS. Export JDK_HOME to your jdk home. Run javac to compile your sources. Run the makefile by typing "make" to build "libProgramMonitor.so". Finally run ">java -Djvmtilib=/home/sacharya/jvmti-examples/libProgramMonitor.so -classpath . Foo" replacing jvmtlib value with the location of the libProgramMontior file in your file system. I know I should have made a maven project, I should have also had the .java file compiled from the make file. Oh well! In addition, why couldn't I have used System.loadLibary() to load the JNI library file? I couldn't, as it didn't work even though I have LD_LIBRARY_PATH defined correctly and am too lazy to figure out why ;-) Also, a better test would have been have a more busy case where the CPU is really occupied to view the GC.

Ping me if you cannot run this example. I have tried the same on Suse 11 and Mandriva Spring.

Resources:
JVM Tool Interface (JVMTI): How VM Agents Work
Java Forum inspiration
Garbage Collection Forcing documentation
Heap Analyzer Tool - Worth checking out
Creating and Debugging a Profiling Agent with JVMTI