Search This Blog

Tuesday, March 16, 2010

Sanjay in Memory Land - Java Memory Leaks and Tools for Detection

Finally, something to write about apart from REST and JMS :-). Tim Burton's Alice in Wonderland has been released and I am anxious to take my kids to the same, till then this BLOG can serve to curb my fantasies :-)

Java has garbage collection right? So can a program developed in Java have a memory leak? Isn't this one the favorite questions of interviewers? In this BLOG I would like to capture information about JVM leaks, diagnosis and how to fix/prevent the same. As always, if this helps someone, great, else it will help me at some point in the future as reference material ;-)

Life is wonderful, the application we have been working for months has been deployed, deadlines have been met, cheers have been exchanged at the local hangout. All of a sudden, a call on your phone, a person from the network and operations teams with bad news to bear. "Your app just crashed with an OutOfMemory error." Shxx! Think of this more as a preparation for the inevitable, everyone needs to meet their NOC person while they are pissed at some time or another :-).

So what could have happened? Before getting into the same, some insight on JVM memory and the types of References is a must. I could get into the same but it would be rather redundant as a very nice article explaining the same by Mirko Novakovic is available on the code centric blog. It is a must read prior to continuing with this BLOG. Please familiarize yourselves with the different generations and memory model. The following figure will serve to help with the different areas of memory available:
In the above shown figure:
  • EDEN is the place where all objects are created
  • SS0 and SS1 are survivor spaces where objects are promoted to
  • OLD represents Old Gen for long living objects that have survived many GC cycles
  • PERM  is PERM Space where Class and Static information lies
One topic that requires a brief mention for completeness is the different Reference Types in Java.  I would definitely recommend a read of Understanding Weak References  by Ethan Nicholas

Reference Types:
1. Strong Reference:

A Strong reference is your regular reference that you use all the time. For example Integer number = new Integer(230); Number is strong reference to an object on the heap.  During development we are typically dealing with strong references.

2. Weak Reference:

A Weak Reference is not as strong as the "Strong Reference". Duh, Obviously! A Weak Reference to an object does not prevent it from being garbage collected. In other words, if a reference to an Object is only a "Weak Reference" to it, then the same is eligibile for garbage collection.

For example, in the figure shown below, Object A is not eligible for GC as there is a strong reference to it:
However, if you look at the following figure you will notice that Object A only has a WeakReference to it thus making Object A eligible for garbage collection:
One implementation of a WeakReference in java.util is the WeakHashMap. In the case of the WeakHashMap, the keys are Weak and values are not. If a key of the map loses its object to garbage collection, then the key is considered reclaimed as well. WeakHashMaps tend to work well when using the Listener or Observer pattern. Many a time we add a listener and never deregister the same. One important thing to keep in mind is that the WeakReferences are NOT Serializable as it is possible that the objects contained could vanish due to garbage collection occurring.

3. Soft Reference:

A soft reference is very much like the WeakReference, however the Garbage collector is less likely to throw the object referenced by the Soft Reference away. Weakly referenced objects will be collected by the GC, however as long as the memory is available, Soft References continue to exist. The same makes them excellent for Caches.

4. Phantom Reference:

Unlike the Weak and SoftReference who exhibit similar behavior only separated by timing and need, the Phantom Reference is totally a different beast. Note that the javadoc on the PhantomReference states that the get() method always returns null. The primary use of such a reference is only to determine when it gets enqueued into a reference queue. One rarely would find the need to use PhantomReferences and therefore I am not going to discuss them further.

Now that we have looked at Reference Types in Java, lets move onto a popular question asked by interviewers. "An Object A has a reference to Object B and Object B has a reference to Object C which has a reference to Object A. Due to the circuilarity do we have a memory leak?"
Well, I do not think that cyclic references are necessarily great but just because a circular reference exists, it does not mean we have a leak present. Consider the following figure:
In the scenario mentioned above, Objects A, B and C cannot be garbage collected as long as there is at least one ROOT reference to one of the objects.  Object A and Object B have ROOT references to them. I like to think of ROOT references as Spring or Wires that are holding the objects in memory by the JVM. If all of the springs attached to the objects are dropped, then the object and its connected graph have no ability to survive. It does not matter how many interconnection they possess, if all root references to the object are gone, the graph is eligible for collection as shown below:

When using memory analyzer tools, you will hear the term GC Root. One of the popular analyzer tools, Yourkit has this definition of the GCRoot and the types of GC Roots:
"The so-called GC (Garbage Collector) roots are objects special for garbage collector. Garbage collector collects these objects that are not GC roots and are not accessible by references from GC roots. There are several kinds of GC roots. One object can belong to more than one kind of root. " - Yourkit Site

Some other terms, I would like to familiarize you with is the Dominator and the and Dominator tree. Both sound powerful, I think of bosses when I hear this, whether at home or work ;-). An object is considered a dominator of another object, if and only iff, it has the sole reference to the dominated object. Consider the figure shown below:
In the figure shown above, object B is the dominator of object D. However, B and C cannot be considered the dominator's of E as they both have references to E. Object C on the other hand is the dominator of F and G. Dominator's are important in memory analysis as, if all reference to a dominator are removed then the entire dominator tree becomes available for garbage collection. For example, if reference to C is removed, then the graph of objects, C, F and G are all eligible for garbage collection.

My 2c on how Garbage Collection works:

Garbage collection involves heap walking from GC Roots. A Mark-Sweep algorithm will mark objects objects that are eligible for collection with the Sweep part picking up the same.

GC's are typically classified as minor and major. A minor gc is typically invoked when Eden space gets filled and the collector sweeps through eden and survivor spaces while promoting objects that have survived some gc cycles older generations. Minor GC's are rapid and hardly interfere with the VM functioning.

Major GC's are invoked when Old Gen starts accumulating objects and growing. All spaces are
collected when a major GC runs. Note that Perm Gen will also be collected during a major GC. When a major GC occurs, other activities of the JVM get suspended. In other words, its a stop the world event of sorts.

One can get information on Garbage collection using the following flags to the JVM on startup.
-XX:+PrintGC Print messages at garbage collection. 
-XX:+PrintGCDetails Print more details at garbage collection. 
-XX:+PrintGCTimeStamps Print timestamps at garbage collection.
An example output of the above is:
3.480: [Full GC [PSYoungGen: 44096K->25531K(88704K)] [PSOldGen: 139579K->158080K(220608K)] 183675K->183611K(309312K) [PSPermGen: 1700K->1700K(16384K)], 0.8611160 secs] 
[Times: user=0.76 sys=0.08, real=0.86 secs] 
4.504: [Full GC [PSYoungGen: 44608K->0K(88704K)] [PSOldGen: 190848K->218908K(287680K)] 235456K->218908K(376384K) [PSPermGen: 1700K->1700K(16384K)], 1.3089360 secs] 
[Times: user=1.19 sys=0.12, real=1.31 secs] 
5.862: [GC [PSYoungGen: 44608K->44672K(100992K)] 263516K->263580K(388672K), 0.2592090 secs] [Times: user=0.50 sys=0.00, real=0.26 secs] 
6.169: [GC [PSYoungGen: 89280K->56384K(112896K)] 308188K->308364K(400576K), 0.5695990 secs] [Times: user=0.95 sys=0.07, real=0.57 secs] 
6.739: [Full GC [PSYoungGen: 56384K->20609K(112896K)] [PSOldGen: 251980K->287680K(338624K)] 308364K->308289K(451520K) [PSPermGen: 1700K->1700K(16384K)], 2.3075900 secs] 
[Times: user=1.52 sys=0.18, real=2.31 secs] 
With the above information, let us look at some of the usual suspects of memory leaks:

The Usual Leak Suspects:

1. The HTTP Session when Working with Web Applications memory:

The HTTP Session object is one place that is often abused as storage area for user information. It is often used as a dumpster of information that lives beyond the scope of a single request. Request state, data that survives more than a single call, i.e., conversational state. Another common scenario is when the HttpSession is used as a cache of sorts, where data once read for a user session is stored for the life time of the user session. Sometimes these Sessions are specified to be kept alive for a very long period of time. In cases mentioned above, the data in the sessions linger and hold references to object graphs for extended periods of time. If there are a number of sessions involved, it's simple enough to do the math for the memory footprint.

2. Class Static References:

Classes that have static variables such as maps and lists that are often populated with information but never cleaned up. Classes find themselves placed in Perm Gen and these will last as long as their class loader is referencing them. Sometimes, like in the case of the HTTP Session, objects are placed onto these static collections and never removed thus growing the heap.
public class MyStatic {
   private static final Map<String, Object> data = new HashMap<String, Object>();
   private MyStatic() {}
   public void add(String key, Object val) { data.put(key, val);}
   public Object get(String key) {return data.get(key);}

3. Thread Local:

The ThreadLocal pattern has some benefits where data is placed in a ThreadLocal by one method to be accessible other methods on the thread. If this data is however not cleared out upon completing the operation with the thread and the thread itself if pooled, leaks start appearing.
This is a common scenario one would find when dealing with a Web Application where a ThreadLocal storage is used for one reason or another. The threads in the container are pooled so after the execution of one request the thread is restored back to the pool. If the data placed in the ThreadLocal is not cleared after the execution of the request, it will continue to linger. As a best practice, I would recommend that you clear the same.

4. Object Pooling:

Object Pooling is one pattern that needs to be handled with care. In earlier days of the JVM, people tended to pool the objects to benefit from performance gains. Performance gains of Object Pooling are highly questionable with current JVM optimizations. Objects that are pooled tend to make it to Old Gen by surviving many garbage collection cycles and linger there as there are strong references to the same. Note that this even effects GC times as if one has Young Gen objects that have references to these old Gen Objects. Pooling really expensive resources such as Database connections, JMS artifacts might be a good idea. In general however, pooling regular objects could prove detrimental rather than beneficial. I would recommend being wary of Object Pooling.

5. Not closing expensive objects:

Objects like connections tend to hold onto other expensive artifacts like Sockets and Streams. These objects usually have a close() method to relieve the resources allocated. If not releived and the top level object is in Old Gen for example, then until a full GC occurs, these would tend to linger. Some resources tend to keep file handles open, often leading to exhaution of the same. Maybe some of you have witnessed the "Too many open file handles" error?

For example, I recently ran into a case where we did not run out of JVM memory but as resources were not being released until GC kicked in, file system handles related to Sockets were held on resulting in exhaustion of the same from the perspective of the file system. As a tip, if one runs out of file handles when on a *Nix environment, consider using lsof and netstat if the symptoms point to network based file handles to determine what is happening. The case mentioned might be of system resources being exhausted and not the JVM memory, however IMO they are indicative of a significant destructive leak.

6. Interned String in the Perm Gen:

One pattern used since long before is to call intern() on Strings. Read more about the same at mindprod. Interning a String results in it making its way into Perm Gen. The java.lang.String class maintains a pool of strings. When the intern() method is invoked, the method checks the pool to see if an equal string is already in the pool. If there is, then the intern method returns it; otherwise it adds the string to the pool.

Some XML parsers in particular tend to intern() strings in an attempt to optimize. A large amount of interned Strings however will be detected in a memory dump where you can view the Perm Gen. Note that interned() strings will be garbage collected, however, they do add to the Perm gen footprint.

7. Badly referenced Anonymous Classes:

Anonymous classes are often used to exchange blocks of code between objects. One truth about the Anonymous class is that it has a reference to the instance that created it thus ensuring that as long as the anonymous class has a Root reference to it, the parent object cannot be garbage collected. A common case of this is the registering of listeners to an object that are never de-registered as listeners when no longer required.
public class A {
  public void someMethod(B b) {
    b.register(new Runnable() {
      public void run() {
public class B {
  private final List Runnable> runnables = new ArrayList<Runnable>();
  public void register(Runnable r) {

8. Proxy Classes in Perm Gen:

Frameworks such as cglib, javassist etc are used to generate Proxy classes. These Proxy Classes find themselves into Perm Gen and too many of these being generated lead to Perm Gen out of memory errors. Perm Gen, lets just bump it up and the problem will go away. Sure that works some times, especially when your code has changed loading a lot more classes, but there are times when it will only come back to haunt you. For example, we recently encountered a case where classes were dynamic ones which were being generated with the instantiation of a particular object every time, this was an incorrect use of the API but the result was increased use of Perm Gen due to the proliferation of proxy classes. Increasing the PermGen might have cured the problem but only temporarily until further further instantiation of the object would have resulted in more classes being put into Perm Gen and a JVM fatality. As an example, with the bad method (uses javassist) shown below, where classes are constantly being created in Perm Gen, repeated invocation of this bad method will cause the perm gen to grow:
public SomeClass createInstanceOfSomeClass() {
  ProxyFactory f = new ProxyFactory();
  MethodHandler mi = new MethodHandler() {
     public Object invoke(Object self, Method m, Method proceed,
      Object[] args) throws Throwable {
        return proceed.invoke(self, args);  // execute the original method.
  f.setFilter(new MethodFilter() {
    public boolean isHandled(Method m) {
     return !m.getName().equals("finalize");

  Class<?> c = f.createClass();
  SomeClass p = (SomeClass) c.newInstance();
 ((ProxyObject) p).setHandler(mi);
  return p;
As an example of Perm gen could grow, see the graph below of the same with the end result being runing out of Perm Gen and receiving the nasty message about the same:

9. Freak Load or Query:

During your testing you have followed the standard path with regards to load, be it loading data from a database or parsing an XML file into a DOM tree, what have you. Ever so often, there is a freak scenario that you have not expected that spikes the memory so badly that you run out of memory. Some of these cases are when a badly designed query leads to the loading of a large amount of data into the heap or a file is uploaded for which you are building an XML DOM Tree that is gigantic. These are examples of large dominator tree's that will definitely consume your heap. Always consider the case of extreme's during your design and development. In particular, ask yourself questions such as; "What if some one uploads a bad file?" or "What will happen to this query as it is not bounded or paged and subsequently a large result set is loaded?"

10. Other Cases:
There are definitely more cases than the above mentioned. For example, J2EE containers that do no unload classes correctly, and hot deploy's increasing Perm Gen as a result.  Your cases will be unique and I would love to hear about the same.

Anyway, as an developer or Architect the following are what I feel one can benefit from, please note the same are not rules or musts but some personal recommendations based of my experiences.

Memory Leak Detection and Analysis:

1. Defensive Development:

When developing code, always keep thinking of how an object you create is going to be used. When working with large data sets, think of accidental or one of cases which could load considerable data into your memory. Consider always limiting your loaded data to definite bounds with paging. Pick what data you need. When working with XML, consider whether you need to load an entire DOM tree or use STAX. Consider tools like Find Bugs to detect unnecessary circularities in your code.

I would recommend always questioning a variable that is added to an object and its purpose. Does the class really need to be stateful in the context or can it be stateless? Will this class be used by multiple threads at the same time? I am not advocating that classes that have state are bad, that would be quite agaisnt my O-O beliefs :-), I am only asking you to question "how" the same at runtime by your library or by others using your library would be used. In addition, a pattern that every developer knows and a favorite answer on an interview question, "What patterns have you used?"; is often the singleton. Singletons often serving to act as a store or cache of data should be questioned and their proliferation controlled as they live for the lifetime of the VM unless explicitly unloaded and can acheive dumpster status. Singletons are NOT necessarily EVIL, I use them all the time at work. It is however possible for them to be misused, example a singleton that has a Map of key value pairs whose information keeps growing over time, clearly you have something to think about here. The same applies if you are using a Dependency container like spring and design objects to have singleton scope rather than prototype scope.

Also watch for the HTTP Session and what goes in there.

2. Active Monitoring:

Clearly detecting early is the way to go. Monitor your application once deployed, err, actually even before. One has many tools at ones disposal to obtain information as to how your application behaves with memory.

Waiting till production only to find a memory leak is not desirable by any means. Baking in memory profiling as part of your releases would be great.

Set up monitoring at all possible places of your application. For HttpSession issues, consider having a HttpSessionAttributeListener that monitors the objects added to the HttpSession and displays the same. One can easily use a decorator such as sitemesh for dev with a flag turned on to display the same. In fact a former colleague of mine had implemented the same to actively detect memory issues. When in different test environments and production, make sure you have alert thresholds setup to notify you regarding memory problems. If in a test environment, have your testers run Visual VM and keep an eye on the graph as they perform their tests. The testers do not need to be skilled in memory management, wouldn't hurt if they are :-), but only trained in the trends indicative of abnormal memory patterns.

3. Post Mortem or Post Traumatic Evaluation:

So you have all the bases covered, tested the stuff the best you can but find an OOM in production. What should you do? Roll back to the previous version that did not have the error? Sure if you can, but it would be better if you can determine the problem prior to doing the same.

One question to ask, did you have the flag "-XX:+HeapDumpOnOutOfMemoryError"? When the flag is enabled, upon a Heap OOM, one automatically gets a dump. The flag does not affect your runtime performance so it is benign.

A first reaction might be just to bump up the memory on the VM an re-start. However, note that the same might work in certain cases while only delaying the inevitable on some others.

It is very important to determine the type of error you find and use memory analyzer tools to determine the offending problem. If you have the option of getting heap dump snapshots do the same and compare and detect dominator tree's, whats in perm gen etc. Sampling, sampling, sampling! When getting any data, consider getting the same at different intervals for sampling. The same can really help in detecting changes in the memory over time.

Free Tools to help with your memory analysis:

1. Visual VM:
Visual VM is a visual interface for viewing, troubleshooting and profiling applications running on a JVM. Many of the previous stand alone tools like JConsole, jstat, jstack, jmap are part of the tool. In addition, the tool is built to support plugins for extensions and support. I have found visualvm a great tool to view the memory, take snapshots and compare the same. Visual vm is now also part of the JDK. The following for example represents a graph from visualvm that demonstrate the ever increasing Old Gen that leads to an eventual Out of Memory.

2. MAT:
Eclipse MAT is a memory analyzer tool from the eclipse community. A stand alone version and a plugin for eclipse are available. This is really great tool for analyzing heap dumps. It is extremely fast in parsing a heap dump and providing valuable reports. In particular some of these reports help diagnose leak suspects for you as well. For example consider the following bad program where a map is constantly being added to and grows over time. Actually this is the same code that generated the Old Gen accumulation graph in the previous figure of visual vm:
public class StaticHolder {
  private static final StaticHolder instance = new StaticHolder();
  private Map<Integer, Integer> map = new HashMap<Integer, Integer>();
  private StaticHolder() {}
  public static StaticHolder instance() {
    return instance;
  public void add(Integer key, Integer value) {
    map.put(key, value);

  public static void main(String args[]) throws InterruptedException {
    for (int i = 0; i < 10000000; i++) {
     StaticHolder.instance().add(i, i);
If you get a heap dump of the running program and open the same using MAT. You can run a leak suspect reports to find it pointing to the location of the leak as shown below:

There is a very nice article on the MAT site on how to find memory leaks that someone using this tool ought to read.

3. BTrace:

Btrace is a tracing tool for the running JVM. Similar to DTrace for Solaris. It involves instrumenting the classes of the target application and introducing tracing code. It looks very promising but there appears to be certain known issues with the code that could cause JVM crashes during the instrumentation. That looks scary but hey atleast the same is documented on their WIKI, what about other tools that you either buy or get free, do they guarantee no crashes? That said, it is pretty easy to use. One develops java classes using the BTrace API, these classes are then compiled and run using the btrace agent to inspect the JVM.

The BTrace API itself has a lot of options and stock profiling code samples available to the developer at their WIKI which I found quite informative. I have personally not used the tool on any production JVMs but I definitely can see the potential in being able to apply probes to gain understanding of the VM. I ran some of their traces on sample code and was quite pleased with the things I could do. This is a project to keep an eye out for.

4. Command Line Tools at your disposal:

As part of the JDK are bundled many independent tools that can be run on
the command line for aiding debugging memory.
a. jps - This tool can be used to determine the process Id of JVM's running.
This tool was introduced as of JDK 1.5

# list pid and short java main class name
2008 Jps
2020 Bootstrap

# list pid and fully-qualified java main class
$jps -l
5457 org.netbeans.Main

# pid, full main class name, and application arguments
$jps -lm
5955 -lm
5457 org.netbeans.Main --userdir /home/sacharya/.visualvm/1.2.2 --branding visualvm

# pid and JVM options
$jps -v
5984 Jps -Dapplication.home=/usr/local/jdk/jdk1.6.0_18 -Xms8m
5457 Main -Djdk.home=/usr/local/jdk/jdk1.6.0_18 -Dnetbeans.system_http_proxy=DIRECT -Dnetbeans.system_http_non_proxy_hosts= -Dnetbeans.dirs=./bin/../tools/visualvm_122/bin/..//visualvm:./bin/../tools/visualvm_122/bin/..//profiler3: -Dnetbeans.home=/home/sacharya/tools/visualvm_122/platform11 -Xms24m -Xmx192m -Dsun.jvmstat.perdata.syncWaitMs=10000 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sacharya/.visualvm/1.2.2/var/log/heapdump.hprof

b. jstat:
jstat is a monitoring tool that allows you to obtain sample data periodically
from a local or remote VM.

c. jmap
jmap is one of the tools that anyone who is working with memory issues should be
familiar with.

jmap is used to print heap memory details of a given process. With jmap you
can get a view of the heap of live and unreachable objects. For example,
executing the following will give you all the live objects in the JVM:
$jmap -histo:live <pid>
where as the following will give you will give you unreachable objects as well:
$jmap -histo <pid>

One can take a heap dump using jmap using the following command after which the same can be analyzed in MAT or Visual VM:
$jmap -dump:live,file=heap.hprof,format=b <pid>

I have not mentioned any of the commercial tools available for memory analysis such as JProbe or Yourkit in detail. I have used JProbe at a previous work place to great effect but never played with Yourkit. One feature I liked about JProbe was the ability to show object graphs and work with the same in determining dominators and roots. These tools definitely serve a market and I need to investigate the same.

Optimizing or Tuning Memory:

One can definitely tune the memory used by your JVM to specific needs. There are so many options that one can provide to the JVM it is mind boggling. I must admit that I have absolutely 0 experience as far as memory tuning goes apart from increasing max heap and perm gen :-(. It would be easy to say, tune it in a case by case basis as a smart architect would. I however choose to point you to the different VM options that you have at your disposal should the need arise I would at some time like to investigate how different ratio's effect the performance of a VM but that is beyond the scope of my rant.


This BLOG is primarily driven by my experiences during my career. There are specialists out there who are extremely skilled in the area of memory debugging and tuning. One such person that I have been fortunate to cross path with is Ken Sipe ,another is my current boss. Ken has had considerable experience in debugging mission critical applications to find and fix their memory issues. His knowledge of the Java memory model is esoteric. Ken had recently spoken in the last Java One conference on debugging your production VM, the same was extremely well received. If there was one presentation I would have liked to attend it would have been his. That said, the slides of the same are available at Slide Share. If any of my understanding is inaccurate, I would appreciate input as always. "No Silent Disagreements!" is my motto after reading about Kayak, a BLOG to follow.

In conclusion, I would say, if you are at a pub after your deploy sipping your favorite drink and you get a call from the NOC, simply do not answer and ruin the moment! Kidding of course ;-)

Friday, March 5, 2010

RESTful Representation with Google Protocol Buffers and Jersey

When working with a RESTful system, one has the option of consuming different types of representations of the same resource. That is one of the beauties of a RESTful system IMO.

In many cases, especially from the SOAP Based services stack, XML has been the representation type of choice for information interchange.  XML was never really intended to be  high performance representation and when working with Java and XML, one often sees performance penalties experienced with mashalling/un-marshalling the payload and size of transfer that are rather undesirable.

There have been other formats that have gained popularity such as JSON which work really well with Java Script and Ajax. For those desiring a comparison between selecting JSON or XML, its only a google search away.

That said, both the above mentioned formats are not binary formats. There are many binary format options to available for users. One of these is google protocol buffers which the focus of this blog.

Why am I blogging about Protcol buffers now? Well, recently I saw a presentation on how Orbitz shifted from using JINI to RESTful services with Google Protocol Buffers as their representation type to success.  Protocol Buffers allowed them to meet their performance needs, versioning needs and language/platform compatibility needs really well.  Since then, I had to try the same :-)

Ted Neward has a very nice article about XML and his take on Protocol buffers where he digs into binary formats, pro-cons etc. I recommend a read of his posting.

Regarding performance metrics of using Java Externalizable, google protocol buffers, XML, JSON, Thrift, Avro etc look at the thrift-protobuf comparison google code page for more details. In addition, Wondering Around by Eishay Smith is a great read on the comparisons.

So all I have here is working example of using Protocol Buffers with Jersey and Spring. Like it my previous examples, I am using the same orders/product model.

I start with definitions of the contract. One defines the messages exchanged in  .proto files. Wow, YAIDL (Yet Another Interface Definition Language) here. True, but contracts need to be exchanged and there has to be a way to do the same especially when dealing with platform/language neutrality and B2B exchanges. I must say that I found the IDL rather easy to understand and use with my limited understanding so far.  One of the beauties of protocol buffers is their considerations for backward compatibility of contracts.  There is an eclipse plugin is available for editing .proto files as well at  With that said, I have two .proto files, one for Order definition and a second for the Products as shown below:
package orders;

option java_package = "com.welflex.orders.proto.dto";
option java_outer_classname= "OrderProtos";
option optimize_for = SPEED;

message LineItem {
 optional int64 id =1;
 required int64 itemId = 2;
 required string itemName = 3;
 required int32 quantity = 4;

message Order {
   optional int64 id = 1;
   repeated LineItem lineItems = 2;   
package products;

option java_package = "com.welflex.products.proto.dto";
option java_outer_classname= "ProductProtos";
option optimize_for = SPEED;

message Product {
   required int32 id = 1;
   required string name = 2;
   required string description = 3;   

message ProductList {
   repeated Product productDto = 1;   
I have defined the above files in the common module at src/main/protobuf and when the maven compiler runs, it will generate equivalent Java Code based of the proto files which can then be used to create messages from consuming java code. The plugin is basically executing the "protoc" compiler to do the same. One can choose to create equivalent C ++ code if required or Python etc etc. However, the same is beyond the scope of this BLOG. With the above definition, is generated during the maven build at target/generated-sources/protoc/com/welflex/orders/proto/dto/ In the file, you will find the individual message which extend These objects are shared by the client and service code.

The generated java code uses the Builder Pattern with method chaining to make it really easy to set the necessary properties and build the protocol buffer message. For example, the Order Message can be built as shown below:
OrderProtos.Order order = OrderProtos.Order.newBuilder().setId(12313L)
                    .setItemId(123).setItemName("Foo Bar").setQuantity(21).build()).build();

For getting the Web Service to work with Jersey, based of another blog I mention later, I defined a custom Provider for marshalling/un-marshalling the Message. What amazes me is the ease of providing custom providers in JAX-RS. Big fan here :-). Message Body Reader and Message Body Writer classes are shown below that assist with the marshalling:
public class ProtobufMessageReader implements MessageBodyReader<Message> {
  public boolean isReadable(Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType) {
    return Message.class.isAssignableFrom(type);

  public Message readFrom(Class<Message> type, Type genericType, Annotation[] annotations,
    MediaType mediaType, MultivaluedMap<String, String> httpHeaders, InputStream entityStream) throws IOException,
    WebApplicationException {
    try {
      Method newBuilder = type.getMethod("newBuilder");
      GeneratedMessage.Builder<?> builder = (GeneratedMessage.Builder<?>) newBuilder.invoke(type);
      return builder.mergeFrom(entityStream).build();
    catch (Exception e) {
      throw new WebApplicationException(e);
public class ProtobufMessageWriter implements MessageBodyWriter<Message> {
  public boolean isWriteable(Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType) {
    return Message.class.isAssignableFrom(type);

  public long getSize(Message m, Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType) {
    return m.getSerializedSize();

  public void writeTo(Message m, Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType, MultivaluedMap<String, Object> httpHeaders, OutputStream entityStream) throws IOException,
    WebApplicationException {
With the above complete, the rest of the code is pretty similar to what I have done in previous BLOGS and therefore am not mentioning the same again. We now have the necessary artifacts to exchange the Protocol Buffer messages over HTTP.

The steps to get this example working are as follows:
1. Download the code from HERE
2. Install the maven plugin and thus Protocol Buffers:
>svn co
>cd maven-plugin
>wget -O pom.xml ''
>mvn install

If the above does not work, you might want to try looking at the Stack Overflow Posting where I got this from.
3. Execute "mvn install" from the root level of the project to see an integration test that will run the life cycle from client to server using Protocol buffers and not XML or JSON :-)

This example is highly inspired by the  fantastic maven example by Sam Pullara at Java Rants on integrating Jersey, Protocol Buffers and Maven. My example is of course tailored to any readers visiting this site ;-) The Products resource returns formats of JSON, XML and Protocol buffers for those interested in trying the same out.

Clearly, one has multiple choices regarding their representation types, the beauty of REST lies in the fact that one does not have to choose one over the other but allow for coexistence.  There are many factors that one would consider when choosing the format of their representation, some of the things I can think of in no particular order of importance,
  • Performance - Marshalling/Un-Marshalling and transport footprint
  • Integration with different platforms/different languages
  • Testability and visibility - binary formats hide this
  • Versioning of services to ensure backward compatibility
  • B2B integration
I wonder whether there will be an effort to support annotations of Java Objects so that they may be transformed into .proto files, ala, JAXB Annotations in the future. So where next, simple, need to look at Avro and Thrift ;-)