Search This Blog

Monday, November 8, 2010

Apache Cassandra with Hector - An Example

Recently I had been to the Strange Loop Conference in Saint Louis. While there I indulged in two things primarily, booze with old buddies and No SQL in the conference.

In particular, I found a lot of mention of Apache Cassandra. Why would one care about Cassandra, how about a 150 TB cluster spanning over 150 machines at Facebook ? Cassandra is used by organizations such as Digg, Twitter etc who deal with a large amount of data. I could attempt to write more on Cassandra but there is a great presentation by Eric Evans on the same http://www.parleys.com/#id=1866&sl=40&st=5.

If not talking about Cassandra, what am I talking about? Well, I wanted to use Cassandra to get a grasp of how Columns and Super Columns work in Cassandra. Yeah, I hear it, WTF are Super Columns? I found myself asking the same question at the Conference, but luckily for me I found this nice blog by Arin Sarkissian titled aptly "WTF is a SuperColumn?" explaining the same. I wanted to translate his example Schema of a Blog Application into a working example that uses Cassandra and provide a playing ground for someone like me wanting to try cassandra.

So I am only going to use Java, sorry no Ruby or Scala for me right now. There is a Thrift Java Client for Cassandra but is limited in functionality so I proceeded to use Hector.

The model created was based Arin's schema with a few enhancements. I have updated the Author Schema to contain a user name and password with the user name being the "row key" for the Column Family Authors.
<!--
    ColumnFamily: Authors
    We'll store all the author data here.

    Row Key = Author user name
    Column Name: an attribute for the entry (title, body, etc)
    Column Value: value of the associated attribute

    Access: get author by userName (aka grab all columns from a specific Row)

    Authors : { // CF
        sacharya : { // row key
            // and the columns as "profile" attributes
            password:#$%#%#$%#%
            name:Sanjay Acharya
            twitterId: sterling23,
            email: sacharya@example.com,
            biography: "bla bla bla"
        },
        // and the other authors
        dduck {
            ...
        }
    }
-->
<ColumnFamily CompareWith="BytesType" Name="Authors"/>
The above Column Family translated to a simple Author POJO as shown below:
public class Author {
  private String userName;
  private String password;
  private String name;
  private String twitterId;
  private String biography;
  ..// Getters and Setters
}
Using Hector directly, a DAO to create an author might look like:
public void create(Author author) {
    Mutator<String> mutator = HFactory.createMutator(keySpace, StringSerializer.get());
    
    String userName = author.getUserName();
    
    mutator.addInsertion(userName,COLUMN_FAMILY_NAME,
        HFactory.createColumn("password", author.getPassword(), StringSerializer.get(),
          StringSerializer.get()))
          .addInsertion(userName, COLUMN_FAMILY_NAME, 
            HFactory.createColumn("name", author.getName(), StringSerializer.get(), 
              StringSerializer.get()))
          .addInsertion(userName, COLUMN_FAMILY_NAME, 
            HFactory.createColumn("biography", author.getBiography(), StringSerializer.get(),
              StringSerializer.get()))
          .addInsertion(userName, COLUMN_FAMILY_NAME, 
            HFactory.createColumn("twitterId", author.getTwitterId(), StringSerializer.get(),
              StringSerializer.get()))
}
The above code felt rather verbose so with a small compromise, column names are the same name as attribute names of the POJO and default constructor must exist for the POJO, I present an AbstractColumnFamilyDao that an AuthorDao for example would implement:
public abstract class AbstractColumnFamilyDao<KeyType, T> {
  private final Class<T> persistentClass;
  private final Class<KeyType> keyTypeClass;
  protected final Keyspace keySpace;
  private final String columnFamilyName;
  private final String[] allColumnNames;

  public AbstractColumnFamilyDao(Keyspace keyspace, Class<KeyType> keyTypeClass, Class<T> persistentClass,
      String columnFamilyName) {
    this.keySpace = keyspace;
    this.keyTypeClass = keyTypeClass;
    this.persistentClass = persistentClass;
    this.columnFamilyName = columnFamilyName;
    this.allColumnNames = DaoHelper.getAllColumnNames(persistentClass);
  }

  public void save(KeyType key, T model) {
  
    Mutator<Object> mutator = HFactory.createMutator(keySpace, SerializerTypeInferer.getSerializer(keyTypeClass));
    for (HColumn<?, ?> column : DaoHelper.getColumns(model)) {
      mutator.addInsertion(key, columnFamilyName, column);
    }

    mutator.execute();
  }

  public T find(KeyType key) {
    SliceQuery<Object, String, byte[]> query = HFactory.createSliceQuery(keySpace,
      SerializerTypeInferer.getSerializer(keyTypeClass), StringSerializer.get(), BytesSerializer.get());

    QueryResult<ColumnSlice<String, byte[]>> result = query.setColumnFamily(columnFamilyName)
        .setKey(key).setColumnNames(allColumnNames).execute();

    if (result.get().getColumns().size() == 0) {
      return null;
    }

    try {
      T t = persistentClass.newInstance();
      DaoHelper.populateEntity(t, result);
      return t;
    }
    catch (Exception e) {
      throw new RuntimeException("Error creating persistent class", e);
    }
  }

  public void delete(KeyType key) {
    Mutator<Object> mutator = HFactory.createMutator(keySpace, SerializerTypeInferer.getSerializer(keyTypeClass));
    mutator.delete(key, columnFamilyName, null, SerializerTypeInferer.getSerializer(keyTypeClass));
  }
}
One might ask, why not just annotate the POJO with JPA annotations and thus handle the persistence? I did head down that route but found a project that was already doing the same, i.e., Kundera. For this reason, I kept the
example more focussed on Hector. Also I am a bit wary regarding whether the JPA specs will be a good fit for a Sparse column store like Cassandra.

With the above mentioned DAO, I modeled the rest of my code to Arin's example schema. The sample code provided contains a Blog Simulation which is a Multi-threaded test that simulates the working of the BLOG application, i.e., authors being created, BLOG Entries being created and authors commenting on BLOG Entries, Finding all Blog Entries created, Getting Blog Entries by a tag, Getting comments for a Blog Entry etc etc.

The example can be DOWNLOADED HERE. You will not need to install a Cassandra server as the example uses an embedded Server. The code however does not demonstrate any fail over or consistency strategies. Enjoy!

Friday, October 1, 2010

MongoDB with Morphia - An example

MongoDB is an open source highly scalable, performant document oriented database and I wanted to play with the same. The database itself is feature rich offering features such as sharding, in place updates and map reduce.

The database is written in C++ and uses JSON style documents for mapping objects. Mongo's java API provides the concept of an object that takes name-value pair's depicting the data and then stores the same. There is a lack of type safety with this approach and also the effort of converting regular java pojo's into the downstream mongo object.

The type safety issue is addressed by the Morphia project that allows for easy mapping of objects from-to MongoDB while also providing a querying interface. The API itself makes use of annotations thus not requiring the use of any configuration files. Think of this like Hibernate/JPA with annotations for Mongo.

The API itself provides for access to Mongo directly if required. In this BLOG, I am trying out a simple example of using Morphia. I developed the project in the following steps:

1. Connection to mongo
2. POJO or Model
3. DAO

I have used a simple data model of an Order and its ancillary objects.

1. Connection to Mongo:
The Mongo object itself is a connection pool so one does not need to create an additional one. Take a look at the documentation on the same.

I define a simple Connection manager that is a singleton that handles the initialization of a Morphia DataStore instance as shown below:
public final class MongoConnectionManager {
  private static final MongoConnectionManager INSTANCE = new MongoConnectionManager();

  private final Datastore db;
  public static final String DB_NAME = "mydb";
  
  private MongoConnectionManager() {
    try {
      Mongo m = new Mongo("localhost", 27017);
      db = new Morphia().map(Order.class).map(LineItem.class).map(Customer.class).createDatastore(
        m, DB_NAME);
      db.ensureIndexes();
    }
    catch (Exception e) {
      throw new RuntimeException("Error initializing mongo db", e);
    }
  }

  public static MongoConnectionManager instance() {
    return INSTANCE;
  }
  ...
}

Also note that in the above code, there is a call to db.ensureIndexes(). This method will synchronously create indices if not present and if present continues on seamlessly.

2. Model:
I then defined my Order Model as shown below:
@Entity(value="orders", noClassNameStored = true)
public class Order {
  @Id
  private ObjectId id;
  @Reference
  private Customer customer;
  @Embedded
  private List<LineItem> lines;
    
  private Date creationDate;
  private Date lastUpdateDate;
  ...
  // Getters and setters
  ..
  
  @PrePersist
  public void prePersist() {
    this.creationDate = (creationDate == null) ? new Date() : creationDate;
    this.lastUpdateDate = (lastUpdateDate == null) ? creationDate : new Date();
  }
}
The morphia annotations are not JPA but very similar. As shown above we are mapping the Order class to the mongo collection of "orders", we define an Id annotation indicating the order identifier. As the type of the Id has been defined as ObjectId, Mongo will generate the Id automatically. If one uses any other type, the Id must be explicitly set. Also note the explicit setting of noClassNameStored as the class name will be otherwise be stored by default. The storing of the class name becomes useful when working with multiple-inheritance structures where the correct class would need to be instantiated.

Note that the Customer object has been defined with an @Reference annotation indicating that the Customer object can exist independent of the Order object and an existing one must be provided before an order can be persisted.

The @Embedded tag on the Line item indicates that the line items lie in the scope of an order and will not exist independently without an order.

The create and update dates have not been annotated but will be included in the MongoDB collection automatically. One could alternatively add the @Property tag to the same and provide a specific name under which the property would reside.

If there is a field that one would not want persisted, then marking the same with @Transient will prevent the same.

Also note the segment where the prePersist methog tagged with the @PrePersist annotation is used to set the dates on the order prior to it being saved. Equivalent annotations exist for @PostPersist, @PreLoad and @PostLoad. The framework also supports the concept of EntityListeners for life cycle phases. So one can create an external class that responds to different
life cycle events as so:
@EntityListeners(OrderListener.class)
public class Order {
}

public class OrderListener {
  @PrePersist
  public void preCreate(Order order) {
    ...
  }
}

Viewing the order using the MongoDB console would display an order as such:
db.orders.find().forEach(printjson);
{
 "_id" : ObjectId("4ca4e5dbb95a4d64192cf119"),
 "customer" : {
  "$ref" : "customers",
  "$id" : ObjectId("4ca4e5dbb95a4d64172cf119")
 },
 "lines" : [
  {
   "lineNumber" : 1,
   "quantity" : 10,
   "product" : {
    "$ref" : "products",
    "$id" : "xbox"
   }
  }
 ],
 "creationDate" : "Thu Sep 30 2010 19:32:43",
 "lastUpdateDate" : "Thu Sep 30 2010 19:32:43"
}

3. DAO:
The Morphia framework provides a DAO class. Nice. The DAO class provides type safe DAO behavior. So if one has a customer object as an example:

DAO<Customer, String> customerDao = new DAO<Customer, String>(Customer.class, dataSource);

// save
customerDao.save(new Customer());
// Read
Customer sanjay = customerDao.createQuery().field("lastName").equal("acharya").get();

The DAO class however provides many operations that might not be desired such as dropCollection(). Decorating the same or extending the same is an easy enough procedure to limit the access of some of these methods while providing additional ones.

public class OrderDaoImpl extends DAO<Order, ObjectId> implements OrderDao {
  public OrderDaoImpl() {
    super(Order.class, MongoConnectionManager.instance().getDb());
  }
  ....

  @Override
  public List<Order> findOrdersByCustomer(Customer customer) {
   return createQuery().field("customer").equal(customer).asList();
  }

  @Override
  public List<Order> findOrdersWithProduct(Product product) {
    return createQuery().field("lines.product").equal(product).asList();
  }
}

From the above code, you see that the query API from Morphia provides an abstraction over the raw json query of mongo. Type safety is present but as in all cases for no fault of theirs, field naming will not be type safe. The "." notation is used to access nested fields,

Another example of a query involving multiple fields is shown below:
public List<Product> getProductsWithCategory(CategoryType type, double maxPrice) {
    return createQuery().field("price").lessThanOrEq(maxPrice).field("categoryType").equal(type).asList();
  }

Indexes can be applied to fields via the @Indexed annotation. For example,in the code shown below, the lastName property of a Customer is to be indexed in decending order.
@Indexed(value=IndexDirection.DESC, name="lastNameIndex", dropDups=false) 
  private String lastName;

I found morphia pretty easy to use with mongo. It will be nice if they support the JPA annotations. There are still things I want to try such as sharding and maybe Lucene integration for full text based search.

The example provided demonstrates Morphia in action with tests that use the model objects and DAOs. To get the tests running, download MongoDB , install the same and have it running on the default port. Run "maven test" in the project provided to see the tests or import the project for a code review :-)

The example can be downloaded HERE....

Some links:
1. A presentation on Mongo and Java
2. 10 things about No-sql databases

Tuesday, September 14, 2010

XML Injection

Been quite sometime since I posted something. Been looking into XML vulnerabilities and figured I'd share. The contents of this BLOG are in no means referencing any employer that I have been involved with and are soley my interests in XML injection.

When developing Web services, one would typically like to keep them secure by preventing agaisnt either Denial of Service or Security Attacks.

So what is XML Injection? If a malicious user alters the contents of an XML document by injecting XML tags, then when an XML parser tries to parse the document, security exploits can be achieved. For the scope of this BLOG, I am not creating a Web Service but explaining in plain vanilla XML and SAX as to how the exploits can occur. The same concepts of course apply to Web Services dealing with XML.

Tag Injection:
Consider the following XML Document that represents an item that is submitted
for purchase.
<item> 
    <description>Widget</description>
    <price>500.0</price>
    <quantity>1</quantity>
</item>
The above XML is represented as a JAXB Object to which it would be un-marshalled as shown below:
@XmlRootElement
public class Item {
  private String description;
  private Double price;
  private int quantity;
 
 // Setters and getters
  ....
}
When the above XML is parsed by a SAX Parser, the resulting Item object is correctly matched up with the corresponding attributes.

Consider the XML fragment altered by a malicious user who was aware of the structure:
<item> 
    <description>Widget</description>
    <price>500.0</price>
    <quantity>1</quantity>
    <!-- Additional Rows below for price and quantity -->
    <price>1.0</price>
    <quantity>1</price>
</item>
When the above document is parsed by the SAX Parser, it interprets the second element as overriding the first and thus the price reflects as 1.00 instead of 500.00. One only needs to think of the ramifications of this successful injection. Always a fan of the dollar store :-).

So how can one prevent the same from happening? Validating the received XML agaisnt an XSD will catch the fallacy in the structure. The following represents a simple XSD for the document:
<xs:schema
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns="http://www.welflex.com/item"
 elementFormDefault="qualified">
 <xs:element name="item">
  <xs:complexType>
   <xs:sequence>
    <xs:element name="description" type="xs:string"></xs:element>
    <xs:element name="price" type="xs:decimal"></xs:element>
    <xs:element name="quantity" type="xs:integer"></xs:element>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>
Now when the SAX Parser validates agaisnt the schema provided with the malicious XML, an exception would be risen to the effect of:
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException: cvc-complex-type.2.4.d: Invalid content was found starting with element 'price'. No child element is expected at this point.]
 at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
 at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:503)
....

As to why an attacker would not simply submit a whole new document with malicious data rather than injecting the same, well I do not have a concrete answer for that and can only suppose it might have to do with source system validation by the target system?

XXE or Xml EXternal Entity Attack:
External entity references in XML allow data from outside the main document to be embedded into the XML document. This "feature" allows for a malicious user to either gain access to sensitive information and/or create a denial of service attack.

Consider the following malevolent XML fragment:
<Person>
  <FirstName>Sanjay</FirstName>
  <LastName>Acharya</LastName>
</Person>

Now, consider the same XML shown above with small modifications made by our friendly neighborhood attacker:
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<Person>
  <FirstName>Sanjay</FirstName>
  <LastName>Acharya&xxe;</LastName>
</Person>

When an XML parser such as a SAX Parser reads the XML in, if running for example on a *NIX system will result in the loading of the contents of the /etc/passwd file into the contents of the resulting parsed document. If the same is returned to the person invoking the attack, well you can imagine their glee at accessing this sensitive data.

The above XML read into a Person object would look like:
Person:Person [firstName=Donald, lastName=Duckroot:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
.........
couchdb:x:106:113:CouchDB Administrator,,,:/var/lib/couchdb:/bin/bash
haldaemon:x:107:114:Hardware abstraction layer,,,:/var/run/hald:/bin/false
speech-dispatcher:x:108:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/sh
kernoops:x:109:65534:Kernel Oops Tracking Daemon,,,:/:/bin/false
saned:x:110:116::/home/saned:/bin/false
pulse:x:111:117:PulseAudio daemon,,,:/var/run/pulse:/bin/false
gdm:x:112:119:Gnome Display Manager:/var/lib/gdm:/bin/false
johndoe:x:1000:1000: John Doe,,,:/home/johndoe:/bin/bash

If the server program parsing the XML was running as root, then the attacker could also access the /etc/shadow file. Using External entity injection, the possibilities of retrieving sensitive information or creating a re-cursive failure and thus denial of service is definitely enticing for an attacker.

Clearly the way to restrict this from happening is either to scan requests at the network level or follow a direction to strictly enforce which entities can be resolved. A strategy to combat the same is explained at Secure Coding.

Another option to consider is to provide a custom SAXParserFactory that will employ the EntityResolver mentioned in the SecureCoding site but is made available either for the entire VM or a particular module. One can employ a custom SAXParserFactory class by registering in a jaxp.properties file in the jre/lib directory or via META-INF/services of an individual module.

An example of a Filtering SAX Parser Factory that could be employed using one of the above mentioned strategies is shown below. The factory delegates to the default factory to create a parser but then adds an EntityResolver to the parser before providing it back to the caller.
public class FilteringSaxParserFactory extends SAXParserFactory {
  // Delegate to this parser
  private SAXParserFactory delegate;
  
  // Delegate class
  private static final String DELEGATE_CLASS = "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl";
  
  // Allowed Entity Paths
  private Set<String> allowedEntityPaths;
  
  public FilteringSaxParserFactory() {
    delegate = SAXParserFactory.newInstance(DELEGATE_CLASS, Thread.currentThread().getContextClassLoader());
    delegate.setNamespaceAware(true);
    delegate.setValidating(true);
    allowedEntityPaths = new HashSet<String>();
    allowedEntityPaths.add("/usr/local/entity/somefile");
  }
  
  @Override
  public boolean getFeature(String name) throws ParserConfigurationException,
    SAXNotRecognizedException,
    SAXNotSupportedException {
    return delegate.getFeature(name);
  }

  @Override
  public SAXParser newSAXParser() throws ParserConfigurationException, SAXException {
    SAXParser parser = delegate.newSAXParser();
    XMLReader xmlReader = parser.getXMLReader();
    xmlReader.setEntityResolver(new EntityResolver() {
      
      @Override
      public InputSource resolveEntity(String publicId, String systemId) throws SAXException,
        IOException {
        if (allowedEntityPaths.contains(systemId)) {
          return new InputSource(systemId);
        }
        
        // Return blank path to prevent untrusted entities
        return new InputSource();
      }
    });
    
    return parser;
  }
  ....
}

XML Bomb Attack:
Another form of an XML attack is whats known as an XML Bomb. The bomb is small XML fragment that makes the data provided grow exponentially during the parsing of the document thus leading to extensive memory consumption and thus room for a denial of service attack.

Consider the following XML Bomb:
<!DOCTYPE item["
       <!ENTITY item "item">
       <!ENTITY item1 "&item;&item;&item;&item;&item;&item;">
       <!ENTITY item2 "&item1;&item1;&item1;&item1;&item1;&item1;&item1;&item1;&item1;">
       <!ENTITY item3 "&item2;&item2;&item2;&item2;&item2;&item2;&item2;&item2;&item2;">
       <!ENTITY item4 "&item3;&item3;&item3;&item3;&item3;&item3;&item3;&item3;&item3;">
       <!ENTITY item5 "&item4;&item4;&item4;&item4;&item4;&item4;&item4;&item4;&item4;">
       <!ENTITY item6 "&item5;&item5;&item5;&item5;&item5;&item5;&item5;&item5;&item5;">
       <!ENTITY item7 "&item6;&item6;&item6;&item6;&item6;&item6;&item6;&item6;&item6;">
       <!ENTITY item8 "&item7;&item7;&item7;&item7;&item7;&item7;&item7;&item7;&item7;">
      ]>
      <item>
        <description>&item8;</description>
        <price>500.0</price>
        <quantity>1</quantity>
       </item>

When attempting to Parse the above fragment, the SAX parser will stop (a feature introduced in JDK 1.4.2)
with the following error:
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.]
 at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
 ... 26 more
Note the fact that the parser complains about finding more than 64,000 entity expansions. The number of entity expansions is a property that can be controlled via "-DentityExpansionLimit".

A lot of the above mentioned scenarios could be reduced by ensuring XSD validation and not using DTD's. DTD's can be totally prevented as well by setting the property "http://apache.org/xml/features/disallow-doc-type-decl" to true. If set, then any XML being parsed that has a DOC Type declaration will cause a fatal parsing error.

An example demonstrating the XML exploits can be downloaded HEREE. Note that to witness the XXE injection, one would need to run the same on a *NIX system. The example provided does not upload private information and is only for demonstration purposes.

Oh well, keeping this BLOG limited in content. Have not even looked into XPath XML injection. Quite interesting, I only wonder how many Web Services are out there where Tag injection exploits can be used on them.

Links I found of value:
1. Preventing External Entity Attacks
2. Testing for XML Injection

Tuesday, April 6, 2010

BatchMessageListenerContainer using Spring and MessageProxies

A few things I wanted to share on this BLOG. First is a way to Batch consume messages using Spring's Listener Containers and the second is an experiment on using Dynamic proxies to send messages and an application of the same.

1. Batch Message Listener:
When using the Spring Listener container, one typically consumes a single message at a time. This works in most scenarios. There however might be scenarios where consuming multiple messages and doing a task in a batch might be more effecient. Clearly one way to ensure batching is to have the message itself be an aggregation of multiple payloads that one would need to process in batch. From a consumer perspective one can process this single message as an atomic unit and either consume or reject the same. Clearly this works if the producer of the messages can/will group the messages together. Further this also means that the producer needs to have understanding of the consumer needs regarding batch effeciency.

What about another way of handling this scenario, where a Batch message listener is used. In this proposal, one would receive X number of messages together as a unit and process them all or roll back all. A Consumer can now choose to capitalize on performance gains by batching the message. There is of course a cost as far as the messaging system is concerned as it would need to hold onto the messages in the batch as unprocessed as long as the JMS Transaction is active.

I found that the spring-batch project had at one time a BatchMessageListenerContainer that did some batching. I could not find the same in later versions of the framework. So I created one that does the same. It is based of the DefaultMessageListenerContainer and has the following requirements of it:
  1. Receive at most X messages and process the same as a unit of work.
  2. If X messages cannot be received before a JMS Session time out, and only X-dX messages have been received when the timeout occurs, then,  process the X-dX received messages even if it did not meet the batch size.
  3. Commit or rollback applies to the entire set of messages. What this means is if there is a bad message, in the bunch, then all messages part of the batch are rolled back.
  4. Only support JMS Transacted Sessions, not supporting User Transactions at this point. Should be easy though.

As the SessionAwareMessageListener from Spring does not have an signature that supports multiple messages, I created one called SessionAwareBatchMessageListener:
public interface SessionAwareBatchMessageListener<M extends Message>{
  /**
   * Perform a batch action with the provided list of {@code messages}.
   * 
   * @param session JMS {@code Session} that received the messages
   * @param messages List of messages to be processed as a unit of work
   * @throws JMSException JMSException thrown if there is an error performing the operation.
   */
  public void onMessages(Session session, List<M> messages) throws JMSException;
}

The BatchMessageListenerContainer that I am demonstrating is an extension of the DefaultMessageListenerContainer and allows for the newly created listener. Note that all this is possible due to beauty of the design of the Spring code that allows for extensions.

The container will receive messages until either the batch size is hit or a JMS Session timeout occurs and dispatches the same to the listener to complete the operation. The code for the same is bit verbose but shown in totality below:
public class BatchMessageListenerContainer extends DefaultMessageListenerContainer {
  public static final int DEFAULT_BATCH_SIZE = 100;
  private int batchSize = DEFAULT_BATCH_SIZE;

  public void setBatchSize(int batchSize) { this.batchSize = batchSize; }
  
  /**
   * The doReceiveAndExecute() method has to be overriden to support multiple-message receives.
   */
  @Override
  protected boolean doReceiveAndExecute(Object invoker, Session session, MessageConsumer consumer,
    TransactionStatus status) throws JMSException {
    Connection conToClose = null;
    MessageConsumer consumerToClose = null;
    Session sessionToClose = null;

    try {
      Session sessionToUse = session;
      MessageConsumer consumerToUse = consumer;
  
      if (sessionToUse == null) {
        Connection conToUse = null;
        if (sharedConnectionEnabled()) {
          conToUse = getSharedConnection();
        }
        else {
          conToUse = createConnection();
          conToClose = conToUse;
          conToUse.start();
        }
        sessionToUse = createSession(conToUse);
        sessionToClose = sessionToUse;
      }
      
      if (consumerToUse == null) {
        consumerToUse = createListenerConsumer(sessionToUse);
        consumerToClose = consumerToUse;
      }
      
      List<Message> messages = new ArrayList<Message>();

      int count = 0;
      Message message = null;
      // Attempt to receive messages with the consumer
      do {
        message = receiveMessage(consumerToUse);
        if (message != null) {
          messages.add(message);
        }
      }
      // Exit loop if no message was received in the time out specified, or
      // if the max batch size was met
      while ((message != null) && (++count < batchSize));

      if (messages.size() > 0) {
        // Only if messages were collected, notify the listener to consume the same.
        try {
          doExecuteListener(sessionToUse, messages);
          sessionToUse.commit();
        }
        catch (Throwable ex) {
          handleListenerException(ex);
          if (ex instanceof JMSException) {
            throw (JMSException) ex;
          }
        }
        return true;
      }

      // No message was received for the period of the timeout, return false.
      noMessageReceived(invoker, sessionToUse);
      return false;
    }
    finally {
      JmsUtils.closeMessageConsumer(consumerToClose);
      JmsUtils.closeSession(sessionToClose);
      ConnectionFactoryUtils.releaseConnection(conToClose, getConnectionFactory(), true);
    }
  }

  protected void doExecuteListener(Session session, List<Message> messages) throws JMSException {
    if (!isAcceptMessagesWhileStopping() && !isRunning()) {
      if (logger.isWarnEnabled()) {
        logger.warn("Rejecting received messages because of the listener container "
          + "having been stopped in the meantime: " + messages);
      }
      rollbackIfNecessary(session);
      throw new JMSException("Rejecting received messages as listener container is stopping");
    }

    @SuppressWarnings("unchecked")
    SessionAwareBatchMessageListener<Message> lsnr = (SessionAwareBatchMessageListener<Message>) getMessageListener();

    try {
      lsnr.onMessages(session, messages);
    }
    catch (JMSException ex) {
      rollbackOnExceptionIfNecessary(session, ex);
      throw ex;
    }
    catch (RuntimeException ex) {
      rollbackOnExceptionIfNecessary(session, ex);
      throw ex;
    }
    catch (Error err) {
      rollbackOnExceptionIfNecessary(session, err);
      throw err;
    }
  }
  
  @Override
  protected void checkMessageListener(Object messageListener) {
    if (!(messageListener instanceof SessionAwareBatchMessageListener)) {
      throw new IllegalArgumentException("Message listener needs to be of type ["
        + SessionAwareBatchMessageListener.class.getName() + "]");
    }
  }
 
  @Override
  protected void validateConfiguration() {
    if (batchSize <= 0) {
      throw new IllegalArgumentException("Property batchSize must be a value greater than 0");
    }
  }
}
There is a demonstration example in the code attached which shows how messages can be received in batch and processed on the consumer.

2. Proxy pattern for JMS Messaging Sending:
Recently I have been seeing quite a few REST Clients that support a Proxy pattern and wanted to experiment to see whether the same can be applied to JMS as well.

The RESTful system has some similarities with JMS:
  1.  Resource location can be compared to JMS Destination
  2.  Mime type can be compared to JMS Message Type
So if we defined a couple of annotations, one for Destination JNDI and a second for MessageType, a proxy interface could look like the following:
public interface MessageSender {
    @DestinationJndi("com.welflex.barqueue")
    @MessageType(ObjectMessage.class)
    public void sendObjectMessage(String s);

    @DestinationJndi("com.welflex.baz.queue")
    @MessageType(TextMessage.class)
    public void sendStringMessage(String s);

    @DestinationJndi("com.welflex.boo.queue")
    @MessageType(MapMessage.class)
    public void sendMapMessage(Map<String, Object> map);
  }
We could then create a Proxy based of the above interface and send messages as shown below:
public void testMethod() {
    SendExecutor executor = new ActiveMQSendExecutor("vm://localhost", "foo", "bar");
    MessageSender s = ProxyFactory.create(MessageSender.class, executor);
    s.sendObjectMessage("Foo");
    s.sendStringMessage("Bar");
    
    Map<String, Object> m = new HashMap<String, Object>();

    m.put("name", "Sanjay");
    m.put("age", 23);
    m.put("salary", 12313131.213F);    

    s.sendMapMessage(m);  
}  
I am not explaining the details of how the proxy is created as the same is available in the downloadable example. One can easily swap out the ActiveMQSendExecutor with a WebLogicSendExecutor, OpenJMSSendExecutor or any other JMS Provider implementation.

So there are many questions that arise with the above:
  1. What about controlling the message properties? Well, the same applies to header properties that one would need to handle with REST Proxy clients. One can define an execution interceptor to provide the same :-)
  2. JMS is so simple; is another layer of abstraction really worth it? Well I agree, mostly, but if all a client needs to do is send a particular type of message and we can abstract away the boiler plate why not? Sure there is some initial development in creating the send executors. Once done with the same, its as easy as annotating an interface and dispatching :-)
  3. What about bytes message and other types? Have not handled the same. 
Clearly there are more questions here. The above is just one of my Frankenstein experiments that I thought I'd share :-)

3. Examples:
Examples that utilize the above code are available in the provided example. A sample batch consumer is demonstrated that persists Person entries to a HSQL database using Hibernate. The batch size of messages consumed is 200 at a time and on every hundred records, Hibernate flushes the session.

Examples of the Proxy class usage is also demonstrated for sending different types of messages.

Download a Maven Example from HERE.

Tuesday, March 16, 2010

Sanjay in Memory Land - Java Memory Leaks and Tools for Detection

Finally, something to write about apart from REST and JMS :-). Tim Burton's Alice in Wonderland has been released and I am anxious to take my kids to the same, till then this BLOG can serve to curb my fantasies :-)

Java has garbage collection right? So can a program developed in Java have a memory leak? Isn't this one the favorite questions of interviewers? In this BLOG I would like to capture information about JVM leaks, diagnosis and how to fix/prevent the same. As always, if this helps someone, great, else it will help me at some point in the future as reference material ;-)

Life is wonderful, the application we have been working for months has been deployed, deadlines have been met, cheers have been exchanged at the local hangout. All of a sudden, a call on your phone, a person from the network and operations teams with bad news to bear. "Your app just crashed with an OutOfMemory error." Shxx! Think of this more as a preparation for the inevitable, everyone needs to meet their NOC person while they are pissed at some time or another :-).

So what could have happened? Before getting into the same, some insight on JVM memory and the types of References is a must. I could get into the same but it would be rather redundant as a very nice article explaining the same by Mirko Novakovic is available on the code centric blog. It is a must read prior to continuing with this BLOG. Please familiarize yourselves with the different generations and memory model. The following figure will serve to help with the different areas of memory available:
In the above shown figure:
  • EDEN is the place where all objects are created
  • SS0 and SS1 are survivor spaces where objects are promoted to
  • OLD represents Old Gen for long living objects that have survived many GC cycles
  • PERM  is PERM Space where Class and Static information lies
One topic that requires a brief mention for completeness is the different Reference Types in Java.  I would definitely recommend a read of Understanding Weak References  by Ethan Nicholas

Reference Types:
1. Strong Reference:

A Strong reference is your regular reference that you use all the time. For example Integer number = new Integer(230); Number is strong reference to an object on the heap.  During development we are typically dealing with strong references.

2. Weak Reference:

A Weak Reference is not as strong as the "Strong Reference". Duh, Obviously! A Weak Reference to an object does not prevent it from being garbage collected. In other words, if a reference to an Object is only a "Weak Reference" to it, then the same is eligibile for garbage collection.

For example, in the figure shown below, Object A is not eligible for GC as there is a strong reference to it:
However, if you look at the following figure you will notice that Object A only has a WeakReference to it thus making Object A eligible for garbage collection:
One implementation of a WeakReference in java.util is the WeakHashMap. In the case of the WeakHashMap, the keys are Weak and values are not. If a key of the map loses its object to garbage collection, then the key is considered reclaimed as well. WeakHashMaps tend to work well when using the Listener or Observer pattern. Many a time we add a listener and never deregister the same. One important thing to keep in mind is that the WeakReferences are NOT Serializable as it is possible that the objects contained could vanish due to garbage collection occurring.

3. Soft Reference:

A soft reference is very much like the WeakReference, however the Garbage collector is less likely to throw the object referenced by the Soft Reference away. Weakly referenced objects will be collected by the GC, however as long as the memory is available, Soft References continue to exist. The same makes them excellent for Caches.

4. Phantom Reference:

Unlike the Weak and SoftReference who exhibit similar behavior only separated by timing and need, the Phantom Reference is totally a different beast. Note that the javadoc on the PhantomReference states that the get() method always returns null. The primary use of such a reference is only to determine when it gets enqueued into a reference queue. One rarely would find the need to use PhantomReferences and therefore I am not going to discuss them further.

Now that we have looked at Reference Types in Java, lets move onto a popular question asked by interviewers. "An Object A has a reference to Object B and Object B has a reference to Object C which has a reference to Object A. Due to the circuilarity do we have a memory leak?"
Well, I do not think that cyclic references are necessarily great but just because a circular reference exists, it does not mean we have a leak present. Consider the following figure:
In the scenario mentioned above, Objects A, B and C cannot be garbage collected as long as there is at least one ROOT reference to one of the objects.  Object A and Object B have ROOT references to them. I like to think of ROOT references as Spring or Wires that are holding the objects in memory by the JVM. If all of the springs attached to the objects are dropped, then the object and its connected graph have no ability to survive. It does not matter how many interconnection they possess, if all root references to the object are gone, the graph is eligible for collection as shown below:

When using memory analyzer tools, you will hear the term GC Root. One of the popular analyzer tools, Yourkit has this definition of the GCRoot and the types of GC Roots:
"The so-called GC (Garbage Collector) roots are objects special for garbage collector. Garbage collector collects these objects that are not GC roots and are not accessible by references from GC roots. There are several kinds of GC roots. One object can belong to more than one kind of root. " - Yourkit Site

Some other terms, I would like to familiarize you with is the Dominator and the and Dominator tree. Both sound powerful, I think of bosses when I hear this, whether at home or work ;-). An object is considered a dominator of another object, if and only iff, it has the sole reference to the dominated object. Consider the figure shown below:
In the figure shown above, object B is the dominator of object D. However, B and C cannot be considered the dominator's of E as they both have references to E. Object C on the other hand is the dominator of F and G. Dominator's are important in memory analysis as, if all reference to a dominator are removed then the entire dominator tree becomes available for garbage collection. For example, if reference to C is removed, then the graph of objects, C, F and G are all eligible for garbage collection.

My 2c on how Garbage Collection works:

Garbage collection involves heap walking from GC Roots. A Mark-Sweep algorithm will mark objects objects that are eligible for collection with the Sweep part picking up the same.

GC's are typically classified as minor and major. A minor gc is typically invoked when Eden space gets filled and the collector sweeps through eden and survivor spaces while promoting objects that have survived some gc cycles older generations. Minor GC's are rapid and hardly interfere with the VM functioning.

Major GC's are invoked when Old Gen starts accumulating objects and growing. All spaces are
collected when a major GC runs. Note that Perm Gen will also be collected during a major GC. When a major GC occurs, other activities of the JVM get suspended. In other words, its a stop the world event of sorts.

One can get information on Garbage collection using the following flags to the JVM on startup.
-XX:+PrintGC Print messages at garbage collection. 
-XX:+PrintGCDetails Print more details at garbage collection. 
-XX:+PrintGCTimeStamps Print timestamps at garbage collection.
An example output of the above is:
3.480: [Full GC [PSYoungGen: 44096K->25531K(88704K)] [PSOldGen: 139579K->158080K(220608K)] 183675K->183611K(309312K) [PSPermGen: 1700K->1700K(16384K)], 0.8611160 secs] 
[Times: user=0.76 sys=0.08, real=0.86 secs] 
4.504: [Full GC [PSYoungGen: 44608K->0K(88704K)] [PSOldGen: 190848K->218908K(287680K)] 235456K->218908K(376384K) [PSPermGen: 1700K->1700K(16384K)], 1.3089360 secs] 
[Times: user=1.19 sys=0.12, real=1.31 secs] 
5.862: [GC [PSYoungGen: 44608K->44672K(100992K)] 263516K->263580K(388672K), 0.2592090 secs] [Times: user=0.50 sys=0.00, real=0.26 secs] 
6.169: [GC [PSYoungGen: 89280K->56384K(112896K)] 308188K->308364K(400576K), 0.5695990 secs] [Times: user=0.95 sys=0.07, real=0.57 secs] 
6.739: [Full GC [PSYoungGen: 56384K->20609K(112896K)] [PSOldGen: 251980K->287680K(338624K)] 308364K->308289K(451520K) [PSPermGen: 1700K->1700K(16384K)], 2.3075900 secs] 
[Times: user=1.52 sys=0.18, real=2.31 secs] 
With the above information, let us look at some of the usual suspects of memory leaks:

The Usual Leak Suspects:

1. The HTTP Session when Working with Web Applications memory:

The HTTP Session object is one place that is often abused as storage area for user information. It is often used as a dumpster of information that lives beyond the scope of a single request. Request state, data that survives more than a single call, i.e., conversational state. Another common scenario is when the HttpSession is used as a cache of sorts, where data once read for a user session is stored for the life time of the user session. Sometimes these Sessions are specified to be kept alive for a very long period of time. In cases mentioned above, the data in the sessions linger and hold references to object graphs for extended periods of time. If there are a number of sessions involved, it's simple enough to do the math for the memory footprint.

2. Class Static References:

Classes that have static variables such as maps and lists that are often populated with information but never cleaned up. Classes find themselves placed in Perm Gen and these will last as long as their class loader is referencing them. Sometimes, like in the case of the HTTP Session, objects are placed onto these static collections and never removed thus growing the heap.
public class MyStatic {
   private static final Map<String, Object> data = new HashMap<String, Object>();
   private MyStatic() {}
   public void add(String key, Object val) { data.put(key, val);}
   public Object get(String key) {return data.get(key);}
}

3. Thread Local:

The ThreadLocal pattern has some benefits where data is placed in a ThreadLocal by one method to be accessible other methods on the thread. If this data is however not cleared out upon completing the operation with the thread and the thread itself if pooled, leaks start appearing.
This is a common scenario one would find when dealing with a Web Application where a ThreadLocal storage is used for one reason or another. The threads in the container are pooled so after the execution of one request the thread is restored back to the pool. If the data placed in the ThreadLocal is not cleared after the execution of the request, it will continue to linger. As a best practice, I would recommend that you clear the same.

4. Object Pooling:

Object Pooling is one pattern that needs to be handled with care. In earlier days of the JVM, people tended to pool the objects to benefit from performance gains. Performance gains of Object Pooling are highly questionable with current JVM optimizations. Objects that are pooled tend to make it to Old Gen by surviving many garbage collection cycles and linger there as there are strong references to the same. Note that this even effects GC times as if one has Young Gen objects that have references to these old Gen Objects. Pooling really expensive resources such as Database connections, JMS artifacts might be a good idea. In general however, pooling regular objects could prove detrimental rather than beneficial. I would recommend being wary of Object Pooling.

5. Not closing expensive objects:

Objects like connections tend to hold onto other expensive artifacts like Sockets and Streams. These objects usually have a close() method to relieve the resources allocated. If not releived and the top level object is in Old Gen for example, then until a full GC occurs, these would tend to linger. Some resources tend to keep file handles open, often leading to exhaution of the same. Maybe some of you have witnessed the "Too many open file handles" error?

For example, I recently ran into a case where we did not run out of JVM memory but as resources were not being released until GC kicked in, file system handles related to Sockets were held on resulting in exhaustion of the same from the perspective of the file system. As a tip, if one runs out of file handles when on a *Nix environment, consider using lsof and netstat if the symptoms point to network based file handles to determine what is happening. The case mentioned might be of system resources being exhausted and not the JVM memory, however IMO they are indicative of a significant destructive leak.

6. Interned String in the Perm Gen:

One pattern used since long before is to call intern() on Strings. Read more about the same at mindprod. Interning a String results in it making its way into Perm Gen. The java.lang.String class maintains a pool of strings. When the intern() method is invoked, the method checks the pool to see if an equal string is already in the pool. If there is, then the intern method returns it; otherwise it adds the string to the pool.

Some XML parsers in particular tend to intern() strings in an attempt to optimize. A large amount of interned Strings however will be detected in a memory dump where you can view the Perm Gen. Note that interned() strings will be garbage collected, however, they do add to the Perm gen footprint.

7. Badly referenced Anonymous Classes:

Anonymous classes are often used to exchange blocks of code between objects. One truth about the Anonymous class is that it has a reference to the instance that created it thus ensuring that as long as the anonymous class has a Root reference to it, the parent object cannot be garbage collected. A common case of this is the registering of listeners to an object that are never de-registered as listeners when no longer required.
public class A {
  public void someMethod(B b) {
    b.register(new Runnable() {
      public void run() {
        ...
      }
    });
  }
}
public class B {
  private final List Runnable> runnables = new ArrayList<Runnable>();
  public void register(Runnable r) {
    runnables.add(r);
  }
}

8. Proxy Classes in Perm Gen:

Frameworks such as cglib, javassist etc are used to generate Proxy classes. These Proxy Classes find themselves into Perm Gen and too many of these being generated lead to Perm Gen out of memory errors. Perm Gen, lets just bump it up and the problem will go away. Sure that works some times, especially when your code has changed loading a lot more classes, but there are times when it will only come back to haunt you. For example, we recently encountered a case where classes were dynamic ones which were being generated with the instantiation of a particular object every time, this was an incorrect use of the API but the result was increased use of Perm Gen due to the proliferation of proxy classes. Increasing the PermGen might have cured the problem but only temporarily until further further instantiation of the object would have resulted in more classes being put into Perm Gen and a JVM fatality. As an example, with the bad method (uses javassist) shown below, where classes are constantly being created in Perm Gen, repeated invocation of this bad method will cause the perm gen to grow:
public SomeClass createInstanceOfSomeClass() {
  ProxyFactory f = new ProxyFactory();
  f.setSuperclass(SomeClass.class);
  MethodHandler mi = new MethodHandler() {
     public Object invoke(Object self, Method m, Method proceed,
      Object[] args) throws Throwable {
        return proceed.invoke(self, args);  // execute the original method.
     }
  };
  f.setFilter(new MethodFilter() {
    public boolean isHandled(Method m) {
     return !m.getName().equals("finalize");
    }
  });

  Class<?> c = f.createClass();
  SomeClass p = (SomeClass) c.newInstance();
 ((ProxyObject) p).setHandler(mi);
  return p;
}
As an example of Perm gen could grow, see the graph below of the same with the end result being runing out of Perm Gen and receiving the nasty message about the same:

9. Freak Load or Query:

During your testing you have followed the standard path with regards to load, be it loading data from a database or parsing an XML file into a DOM tree, what have you. Ever so often, there is a freak scenario that you have not expected that spikes the memory so badly that you run out of memory. Some of these cases are when a badly designed query leads to the loading of a large amount of data into the heap or a file is uploaded for which you are building an XML DOM Tree that is gigantic. These are examples of large dominator tree's that will definitely consume your heap. Always consider the case of extreme's during your design and development. In particular, ask yourself questions such as; "What if some one uploads a bad file?" or "What will happen to this query as it is not bounded or paged and subsequently a large result set is loaded?"

10. Other Cases:
There are definitely more cases than the above mentioned. For example, J2EE containers that do no unload classes correctly, and hot deploy's increasing Perm Gen as a result.  Your cases will be unique and I would love to hear about the same.

Anyway, as an developer or Architect the following are what I feel one can benefit from, please note the same are not rules or musts but some personal recommendations based of my experiences.

Memory Leak Detection and Analysis:

1. Defensive Development:

When developing code, always keep thinking of how an object you create is going to be used. When working with large data sets, think of accidental or one of cases which could load considerable data into your memory. Consider always limiting your loaded data to definite bounds with paging. Pick what data you need. When working with XML, consider whether you need to load an entire DOM tree or use STAX. Consider tools like Find Bugs to detect unnecessary circularities in your code.

I would recommend always questioning a variable that is added to an object and its purpose. Does the class really need to be stateful in the context or can it be stateless? Will this class be used by multiple threads at the same time? I am not advocating that classes that have state are bad, that would be quite agaisnt my O-O beliefs :-), I am only asking you to question "how" the same at runtime by your library or by others using your library would be used. In addition, a pattern that every developer knows and a favorite answer on an interview question, "What patterns have you used?"; is often the singleton. Singletons often serving to act as a store or cache of data should be questioned and their proliferation controlled as they live for the lifetime of the VM unless explicitly unloaded and can acheive dumpster status. Singletons are NOT necessarily EVIL, I use them all the time at work. It is however possible for them to be misused, example a singleton that has a Map of key value pairs whose information keeps growing over time, clearly you have something to think about here. The same applies if you are using a Dependency container like spring and design objects to have singleton scope rather than prototype scope.

Also watch for the HTTP Session and what goes in there.

2. Active Monitoring:

Clearly detecting early is the way to go. Monitor your application once deployed, err, actually even before. One has many tools at ones disposal to obtain information as to how your application behaves with memory.

Waiting till production only to find a memory leak is not desirable by any means. Baking in memory profiling as part of your releases would be great.

Set up monitoring at all possible places of your application. For HttpSession issues, consider having a HttpSessionAttributeListener that monitors the objects added to the HttpSession and displays the same. One can easily use a decorator such as sitemesh for dev with a flag turned on to display the same. In fact a former colleague of mine had implemented the same to actively detect memory issues. When in different test environments and production, make sure you have alert thresholds setup to notify you regarding memory problems. If in a test environment, have your testers run Visual VM and keep an eye on the graph as they perform their tests. The testers do not need to be skilled in memory management, wouldn't hurt if they are :-), but only trained in the trends indicative of abnormal memory patterns.

3. Post Mortem or Post Traumatic Evaluation:

So you have all the bases covered, tested the stuff the best you can but find an OOM in production. What should you do? Roll back to the previous version that did not have the error? Sure if you can, but it would be better if you can determine the problem prior to doing the same.

One question to ask, did you have the flag "-XX:+HeapDumpOnOutOfMemoryError"? When the flag is enabled, upon a Heap OOM, one automatically gets a dump. The flag does not affect your runtime performance so it is benign.

A first reaction might be just to bump up the memory on the VM an re-start. However, note that the same might work in certain cases while only delaying the inevitable on some others.

It is very important to determine the type of error you find and use memory analyzer tools to determine the offending problem. If you have the option of getting heap dump snapshots do the same and compare and detect dominator tree's, whats in perm gen etc. Sampling, sampling, sampling! When getting any data, consider getting the same at different intervals for sampling. The same can really help in detecting changes in the memory over time.

Free Tools to help with your memory analysis:

1. Visual VM:
Visual VM is a visual interface for viewing, troubleshooting and profiling applications running on a JVM. Many of the previous stand alone tools like JConsole, jstat, jstack, jmap are part of the tool. In addition, the tool is built to support plugins for extensions and support. I have found visualvm a great tool to view the memory, take snapshots and compare the same. Visual vm is now also part of the JDK. The following for example represents a graph from visualvm that demonstrate the ever increasing Old Gen that leads to an eventual Out of Memory.

2. MAT:
Eclipse MAT is a memory analyzer tool from the eclipse community. A stand alone version and a plugin for eclipse are available. This is really great tool for analyzing heap dumps. It is extremely fast in parsing a heap dump and providing valuable reports. In particular some of these reports help diagnose leak suspects for you as well. For example consider the following bad program where a map is constantly being added to and grows over time. Actually this is the same code that generated the Old Gen accumulation graph in the previous figure of visual vm:
public class StaticHolder {
  private static final StaticHolder instance = new StaticHolder();
  private Map<Integer, Integer> map = new HashMap<Integer, Integer>();
  private StaticHolder() {}
  public static StaticHolder instance() {
    return instance;
  }
  
  public void add(Integer key, Integer value) {
    map.put(key, value);
  }

  public static void main(String args[]) throws InterruptedException {
    for (int i = 0; i < 10000000; i++) {
     StaticHolder.instance().add(i, i);
     Thread.sleep(5);
    }
  }
}
If you get a heap dump of the running program and open the same using MAT. You can run a leak suspect reports to find it pointing to the location of the leak as shown below:

There is a very nice article on the MAT site on how to find memory leaks that someone using this tool ought to read.

3. BTrace:

Btrace is a tracing tool for the running JVM. Similar to DTrace for Solaris. It involves instrumenting the classes of the target application and introducing tracing code. It looks very promising but there appears to be certain known issues with the code that could cause JVM crashes during the instrumentation. That looks scary but hey atleast the same is documented on their WIKI, what about other tools that you either buy or get free, do they guarantee no crashes? That said, it is pretty easy to use. One develops java classes using the BTrace API, these classes are then compiled and run using the btrace agent to inspect the JVM.

The BTrace API itself has a lot of options and stock profiling code samples available to the developer at their WIKI which I found quite informative. I have personally not used the tool on any production JVMs but I definitely can see the potential in being able to apply probes to gain understanding of the VM. I ran some of their traces on sample code and was quite pleased with the things I could do. This is a project to keep an eye out for.

4. Command Line Tools at your disposal:

As part of the JDK are bundled many independent tools that can be run on
the command line for aiding debugging memory.
a. jps - This tool can be used to determine the process Id of JVM's running.
This tool was introduced as of JDK 1.5

# list pid and short java main class name
C:\>jps
2008 Jps
2020 Bootstrap

# list pid and fully-qualified java main class
$jps -l
5972 sun.tools.jps.Jps
5457 org.netbeans.Main

# pid, full main class name, and application arguments
$jps -lm
5955 sun.tools.jps.Jps -lm
5457 org.netbeans.Main --userdir /home/sacharya/.visualvm/1.2.2 --branding visualvm

# pid and JVM options
$jps -v
5984 Jps -Dapplication.home=/usr/local/jdk/jdk1.6.0_18 -Xms8m
5457 Main -Djdk.home=/usr/local/jdk/jdk1.6.0_18 -Dnetbeans.system_http_proxy=DIRECT -Dnetbeans.system_http_non_proxy_hosts= -Dnetbeans.dirs=./bin/../tools/visualvm_122/bin/..//visualvm:./bin/../tools/visualvm_122/bin/..//profiler3: -Dnetbeans.home=/home/sacharya/tools/visualvm_122/platform11 -Xms24m -Xmx192m -Dnetbeans.accept_license_class=com.sun.tools.visualvm.modules.startup.AcceptLicense -Dsun.jvmstat.perdata.syncWaitMs=10000 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sacharya/.visualvm/1.2.2/var/log/heapdump.hprof

b. jstat:
jstat is a monitoring tool that allows you to obtain sample data periodically
from a local or remote VM.

c. jmap
jmap is one of the tools that anyone who is working with memory issues should be
familiar with.

jmap is used to print heap memory details of a given process. With jmap you
can get a view of the heap of live and unreachable objects. For example,
executing the following will give you all the live objects in the JVM:
$jmap -histo:live <pid>
where as the following will give you will give you unreachable objects as well:
$jmap -histo <pid>

One can take a heap dump using jmap using the following command after which the same can be analyzed in MAT or Visual VM:
$jmap -dump:live,file=heap.hprof,format=b <pid>

I have not mentioned any of the commercial tools available for memory analysis such as JProbe or Yourkit in detail. I have used JProbe at a previous work place to great effect but never played with Yourkit. One feature I liked about JProbe was the ability to show object graphs and work with the same in determining dominators and roots. These tools definitely serve a market and I need to investigate the same.

Optimizing or Tuning Memory:

One can definitely tune the memory used by your JVM to specific needs. There are so many options that one can provide to the JVM it is mind boggling. I must admit that I have absolutely 0 experience as far as memory tuning goes apart from increasing max heap and perm gen :-(. It would be easy to say, tune it in a case by case basis as a smart architect would. I however choose to point you to the different VM options that you have at your disposal should the need arise http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp I would at some time like to investigate how different ratio's effect the performance of a VM but that is beyond the scope of my rant.

Concluding:

This BLOG is primarily driven by my experiences during my career. There are specialists out there who are extremely skilled in the area of memory debugging and tuning. One such person that I have been fortunate to cross path with is Ken Sipe ,another is my current boss. Ken has had considerable experience in debugging mission critical applications to find and fix their memory issues. His knowledge of the Java memory model is esoteric. Ken had recently spoken in the last Java One conference on debugging your production VM, the same was extremely well received. If there was one presentation I would have liked to attend it would have been his. That said, the slides of the same are available at Slide Share. If any of my understanding is inaccurate, I would appreciate input as always. "No Silent Disagreements!" is my motto after reading about Kayak, a BLOG to follow.

In conclusion, I would say, if you are at a pub after your deploy sipping your favorite drink and you get a call from the NOC, simply do not answer and ruin the moment! Kidding of course ;-)

Friday, March 5, 2010

RESTful Representation with Google Protocol Buffers and Jersey

When working with a RESTful system, one has the option of consuming different types of representations of the same resource. That is one of the beauties of a RESTful system IMO.

In many cases, especially from the SOAP Based services stack, XML has been the representation type of choice for information interchange.  XML was never really intended to be  high performance representation and when working with Java and XML, one often sees performance penalties experienced with mashalling/un-marshalling the payload and size of transfer that are rather undesirable.

There have been other formats that have gained popularity such as JSON which work really well with Java Script and Ajax. For those desiring a comparison between selecting JSON or XML, its only a google search away.

That said, both the above mentioned formats are not binary formats. There are many binary format options to available for users. One of these is google protocol buffers which the focus of this blog.

Why am I blogging about Protcol buffers now? Well, recently I saw a presentation on how Orbitz shifted from using JINI to RESTful services with Google Protocol Buffers as their representation type to success.  Protocol Buffers allowed them to meet their performance needs, versioning needs and language/platform compatibility needs really well.  Since then, I had to try the same :-)

Ted Neward has a very nice article about XML and his take on Protocol buffers where he digs into binary formats, pro-cons etc. I recommend a read of his posting.

Regarding performance metrics of using Java Externalizable, google protocol buffers, XML, JSON, Thrift, Avro etc look at the thrift-protobuf comparison google code page for more details. In addition, Wondering Around by Eishay Smith is a great read on the comparisons.

So all I have here is working example of using Protocol Buffers with Jersey and Spring. Like it my previous examples, I am using the same orders/product model.

I start with definitions of the contract. One defines the messages exchanged in  .proto files. Wow, YAIDL (Yet Another Interface Definition Language) here. True, but contracts need to be exchanged and there has to be a way to do the same especially when dealing with platform/language neutrality and B2B exchanges. I must say that I found the IDL rather easy to understand and use with my limited understanding so far.  One of the beauties of protocol buffers is their considerations for backward compatibility of contracts.  There is an eclipse plugin is available for editing .proto files as well at http://code.google.com/p/protoclipse.  With that said, I have two .proto files, one for Order definition and a second for the Products as shown below:
Orders.proto
package orders;

option java_package = "com.welflex.orders.proto.dto";
option java_outer_classname= "OrderProtos";
option optimize_for = SPEED;

message LineItem {
 optional int64 id =1;
 required int64 itemId = 2;
 required string itemName = 3;
 required int32 quantity = 4;
}

message Order {
   optional int64 id = 1;
   repeated LineItem lineItems = 2;   
}
Products.proto
package products;

option java_package = "com.welflex.products.proto.dto";
option java_outer_classname= "ProductProtos";
option optimize_for = SPEED;

message Product {
   required int32 id = 1;
   required string name = 2;
   required string description = 3;   
}

message ProductList {
   repeated Product productDto = 1;   
}
I have defined the above files in the common module at src/main/protobuf and when the maven compiler runs, it will generate equivalent Java Code based of the proto files which can then be used to create messages from consuming java code. The plugin is basically executing the "protoc" compiler to do the same. One can choose to create equivalent C ++ code if required or Python etc etc. However, the same is beyond the scope of this BLOG. With the above definition, OrderProtos.java is generated during the maven build at target/generated-sources/protoc/com/welflex/orders/proto/dto/OrderProtos.java. In the file, you will find the individual message which extend com.google.protos.GeneratedMessage. These objects are shared by the client and service code.

The generated java code uses the Builder Pattern with method chaining to make it really easy to set the necessary properties and build the protocol buffer message. For example, the Order Message can be built as shown below:
OrderProtos.Order order = OrderProtos.Order.newBuilder().setId(12313L)
     .addLineItem(OrderProtos.LineItem.newBuilder().setId(8913L)
                    .setItemId(123).setItemName("Foo Bar").setQuantity(21).build()).build();

For getting the Web Service to work with Jersey, based of another blog I mention later, I defined a custom Provider for marshalling/un-marshalling the Message. What amazes me is the ease of providing custom providers in JAX-RS. Big fan here :-). Message Body Reader and Message Body Writer classes are shown below that assist with the marshalling:
@Provider
@Component
@Consumes(AlternateMediaType.APPLICATION_XPROTOBUF)
public class ProtobufMessageReader implements MessageBodyReader<Message> {
  public boolean isReadable(Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType) {
    return Message.class.isAssignableFrom(type);
  }

  public Message readFrom(Class<Message> type, Type genericType, Annotation[] annotations,
    MediaType mediaType, MultivaluedMap<String, String> httpHeaders, InputStream entityStream) throws IOException,
    WebApplicationException {
    try {
      Method newBuilder = type.getMethod("newBuilder");
      GeneratedMessage.Builder<?> builder = (GeneratedMessage.Builder<?>) newBuilder.invoke(type);
      return builder.mergeFrom(entityStream).build();
    }
    catch (Exception e) {
      throw new WebApplicationException(e);
    }
  }
}
@Provider
@Component
@Produces(AlternateMediaType.APPLICATION_XPROTOBUF)
public class ProtobufMessageWriter implements MessageBodyWriter<Message> {
  public boolean isWriteable(Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType) {
    return Message.class.isAssignableFrom(type);
  }

  public long getSize(Message m, Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType) {
    return m.getSerializedSize();
  }

  public void writeTo(Message m, Class<?> type, Type genericType, Annotation[] annotations,
    MediaType mediaType, MultivaluedMap<String, Object> httpHeaders, OutputStream entityStream) throws IOException,
    WebApplicationException {
    entityStream.write(m.toByteArray());
  }
}
With the above complete, the rest of the code is pretty similar to what I have done in previous BLOGS and therefore am not mentioning the same again. We now have the necessary artifacts to exchange the Protocol Buffer messages over HTTP.

The steps to get this example working are as follows:
1. Download the code from HERE
2. Install the maven plugin and thus Protocol Buffers:
>svn co http://protobuf.googlecode.com/svn/branches/maven-plugin/tools/maven-plugin
>cd maven-plugin
>wget -O pom.xml 'http://protobuf.googlecode.com/issues/attachment?aid=8860476605163151855&name=pom.xml'
>mvn install

If the above does not work, you might want to try looking at the Stack Overflow Posting where I got this from.
3. Execute "mvn install" from the root level of the project to see an integration test that will run the life cycle from client to server using Protocol buffers and not XML or JSON :-)

This example is highly inspired by the  fantastic maven example by Sam Pullara at Java Rants on integrating Jersey, Protocol Buffers and Maven. My example is of course tailored to any readers visiting this site ;-) The Products resource returns formats of JSON, XML and Protocol buffers for those interested in trying the same out.

Clearly, one has multiple choices regarding their representation types, the beauty of REST lies in the fact that one does not have to choose one over the other but allow for coexistence.  There are many factors that one would consider when choosing the format of their representation, some of the things I can think of in no particular order of importance,
  • Performance - Marshalling/Un-Marshalling and transport footprint
  • Integration with different platforms/different languages
  • Testability and visibility - binary formats hide this
  • Versioning of services to ensure backward compatibility
  • B2B integration
I wonder whether there will be an effort to support annotations of Java Objects so that they may be transformed into .proto files, ala, JAXB Annotations in the future. So where next, simple, need to look at Avro and Thrift ;-)

Saturday, February 27, 2010

REST Client Frameworks - Your options

In previous blogs, I have discussed how the different frameworks for REST work, client and server side. In particular, I have investigated different JAX RS vendors such as Jersey, RESTEasy and Restlet JAX-RS and an implementation that is not JAX-RS as well, i.e., Core Restlet whose API pre-dated the JAX RS specification. I am currently looking at different JAX RS vendor implements for client side support. The JAX RS specification does not include a client side API and different JAX RS vendors have proceeded to create their own API's for the same. The framework one selects for the client and service ends do not have to be the same. For example, one could have a RESTful service running using Restlet while having a client developed in RESTEasy that consumes the same. In most cases, one will typically be satisfied with using the same client framework as one is using for the service development in order to be consistent and potentially be able to re-use artifacts in both areas if the framework permits the same.
That said, I have been looking at the following vendors for the Client side API and implementation as my needs require me to.
  • Jersey - A JAX RS reference implementation from Oracle/Sun.
  • RESTEasy - A JAX RS reference implementation by JBoss.
  • Restlet - A pioneer of sorts that has support for Client Side REST calls
  • CXF - An apache project that is a merger of XFire and Celtix. Provides a JAX RS implementation and JAX WS as well.
  • Apache Wink - A JAX RS implementation that has a client and server side api that has had a 1.0 release just recently
  • Spring 3.0 - Spring provides a RestTemplate, quite like the JmsTemplate and HibernateTemplate in its support of REST clients.
  • Apache HTTPClient or Http Components - Direct use of HTTP API - Down and dirty.
As an evaluator, I would look for the following things in a Client framework:
  • User Base - Documentation, Support, Community, Release cycles, Age of solution
  • Easy of API for RESTful operations - URL Parameter substitution, HTTP verbs, Proxy support
  • Dryness of the API - Certain frameworks allow for components such as contracts developed for the service to be re-used in the client
  • Http Connection Control - Ability to do connection pooling, connection reaping and controlling lower level HTTP connection settings such as connection time out if required
  • Safety of the API - It is possible to "leak" connections. This occurs when connections are not restored properly to the pool.
  • Support for different Provider types - JAXB, JSON, Atom, Yaml and others while providing hookins to introduce new provider types.
  • Interception of a requests - For security and ability to add additional information, for example to HTTP headers
  • Deployment footprint - number of dependencies the library brings.
  • Error Response Handling - Ease of handling alternate response types apart from standard expected responses, for example how the framework supports exception throwing.
  • Performance - Efficiency of Provided Converters such as JAXB, JSON, Memory foot print etc
The above list represent things that one would look for while evaluating a solution. What I aim to do with this blog is provide a maven project that utilizes these client framework thus allowing evaluators to investigate the different solutions. I will put forth my opinions as well.
When dealing with connections, it is often desirable to keep alive Http Connections. The Apache Http Client has for sometime now provided a library that facilitates connection keep alive, connection reaping and safe release of connections. It is almost the defacto underlying Http Client library for Http operations. The standard Http support for the core jdk is limited in its offering. Apache Http Client recently underwent a re-haul of their architecture to clean out areas of their base while providing performance enhancements. Using Http Client directly has some limitations though where one would need to build converters for standard media types like JAXB, JSON, Atom etc. RESTful client libraries like RESTEasy, Jersey, Apache Wink and Restlet have a RESTful API layer on top of Apache Http Client that eases the development of RESTful clients.
I am working with a Maven project that has a service that uses Jersey while having clients from different frameworks consume the same. The code is NOT doing any benchmarking of any sort, it can however be used to if required. What the code instead does is demonstrate the different clients and their API's. It also demonstrates the use of Proxies where applicable.
The service developed is very similar to the ones I have used in previous blogs where a client can obtain product information and then perform CRUD operations on Orders. Subsequently, from the clients perspective, there are two clients. Order Clients and Product Clients. Some of the client frameworks being discussed have the concept of annotation driven proxies that allow for easy development; for those frameworks, I have provided proxy clients in the examples as well.


All the clients of the example, implement one of the following two interfaces,
OrdersClientIF

public interface OrdersClientIF { 
public static final String ORDERS_RESOURCE = "/orders";
 /**
  * Create an order.
  */
  public OrderDto create(OrderDto dto) throws OrderValidationException, OrderException;

  /**
   * Update an existing order
   */
  public void update(OrderDto dto) throws OrderNotFoundException, OrderValidationException,    OrderException;

  /**
   * Retreive an Order
   */
  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException;

  /** 
   * Deletes an order
   */
  public void delete(Long orderId) throws OrderException;
}

ProductsClientIF

public interface ProductsClientIF {
public static final String PRODUCTS_RESOURCE = "/products";
  /** 
   * @return A set of Products
   */
  public Set<ProductDto> getProducts();
}

1. Apache CXF:
Apache CXF is a full fledged JAX RS implementation with a client side api as well. It is a framework for JAX-RS and JAX-WS. A client API is provided in three forms, proxy based, HTTP-Centric and XML-centric. Apache Cxf however does not use HTTP Components or Apache HTTP Client. More information on their Client API can be viewed at http://cxf.apache.org/docs/jax-rs.html#JAX-RS-ClientAPI

a. Proxy Based:

public class ApacheCxfProxiedOrdersClient implements OrdersClientIF {
  /**
   * Proxy Definition
   */
  private static interface OrderCxfIF {
    @GET
    @Consumes(MediaType.APPLICATION_XML)
    @Path(ORDERS_RESOURCE + "/{id}")
    public OrderDto get(@PathParam("id") String id);

    @POST
    @Produces(MediaType.APPLICATION_XML)
    @Consumes(MediaType.APPLICATION_XML)
    @Path(ORDERS_RESOURCE)
    public OrderDto create(OrderDto dto);

    @PUT
    @Produces(MediaType.APPLICATION_XML)
    @Path(ORDERS_RESOURCE + "/{id}")
    public void update(@PathParam("id") String id, OrderDto dto);

    @DELETE
    @Path(ORDERS_RESOURCE + "/{id}")
    public void delete(@PathParam("id") String id);
}
....

public OrderDto create(OrderDto dto) throws OrderValidationException, OrderException {
  try {
    return JAXRSClientFactory.create(baseUri, OrderCxfIF.class).create(dto);
  }
  catch (WebApplicationException e) {
    if (e.getResponse().getStatus() == Status.BAD_REQUEST.getStatusCode()) {
      throw new OrderValidationException(e.getMessage());
    }
    throw new OrderException(e.getMessage());
  }
}
...
@Override
public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
  try {
    return JAXRSClientFactory.create(baseUri, OrderCxfIF.class).get(String.valueOf(orderId));
  }
  catch (WebApplicationException e) {
    if (e.getResponse().getStatus() == Status.NOT_FOUND.getStatusCode()) {
      throw new OrderNotFoundException(e.getMessage());  
    }
    throw new OrderException(e.getMessage());
  }
}
..
}
With the Proxy based API one can re-use server side artifacts for the client side as well. The API looks pretty straight forward to use and if requiring more control, one can also utilize the WebClient for more detailed operations such as setting header or content type. For handling exceptions, the Cxf site suggests the using the ResponseExceptionMapper. I however could not get the same to be registered and working. The documentation on the same appeared sparse. If one does not define a ResponseExceptionMapper, then when a failure occurs, a WebApplicationException is thrown. One can utilize the same to re-throw appropriate exceptions and consume alternate return types.

b. HTTP-Centric:

public class ApacheCxfOrdersClient implements OrdersClientIF {
....
  public OrderDto create(OrderDto dto) throws OrderValidationException, OrderException {
    try {
       return WebClient.create(baseUri).path(ORDERS_RESOURCE).accept(MediaType.APPLICATION_XML)
         .invoke(HttpMethod.POST, dto, OrderDto.class);
    }
    catch (WebApplicationException e) {
      if (e.getResponse().getStatus() == Status.BAD_REQUEST.getStatusCode()) {
        throw new OrderValidationException(e.getMessage());
      } 
      throw new OrderException(e.getMessage());
    }
  }
....
  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    try {
      return   WebClient.create(baseUri).path(ORDERS_RESOURCE).path(String.valueOf(orderId)).accept(
         MediaType.APPLICATION_XML).invoke(HttpMethod.GET, null, OrderDto.class); 
    }
    catch (WebApplicationException e) {
      if (e.getResponse().getStatus() == Status.NOT_FOUND.getStatusCode()) {
        throw new OrderNotFoundException(e.getMessage());
      }
      throw new OrderException(e.getMessage());
    }
  }
}

With the HTTP Centric client, one uses WebClient instances. As in the case of the Proxy, one can catch the WebApplicationException in the case of failure responses for Exception management.

2. RESTEasy:

RESTEasy is a JBoss project that is a fully certified JAX-RS implementation. The client side framework supports the Proxy style model and a HTTP centric model as well. The client side framework utilizes Apache HTTP Client 3.X and has support for 4.X as well thus one has the ability to easily control connection parameters and pooling. The last time I read about the framework, both Http Client versions are supported but the HttpClient 4 has not been validated as well as the 3 version. The same might have changed since then. It is very easy to set one or the other though. Another nice feature is that an interface can be shared between the client and server. Requests can also nicely be intercepted via an implementation of org.jboss.resteasy.spi.interception.ClientExecutionInterceptor to add addition information to the header etc. More information on their Client API can be viewed from the web site http://www.jboss.org/file-access/default/members/resteasy/freezone/docs/1.2.GA/userguide/html/RESTEasy_Client_Framework.html

a. Proxy Based:

public class ResteasyProxiedOrdersClient implements OrdersClientIF {
  private static interface RestEasyIF {
    @GET
    @Consumes(MediaType.APPLICATION_XML)
    @Path(ORDERS_RESOURCE + "/{id}")
    public OrderDto get(@PathParam("id") Long id);

    @POST
    @Produces(MediaType.APPLICATION_XML)
    @Consumes(MediaType.APPLICATION_XML)
    @Path(ORDERS_RESOURCE)
    public OrderDto create(OrderDto order);

    @PUT
    @Produces(MediaType.APPLICATION_XML)
    @Consumes(MediaType.APPLICATION_XML)
    @Path(ORDERS_RESOURCE + "/{id}")
    public void update(@PathParam("id") Long id, OrderDto dto);

    @DELETE
    @Path(ORDERS_RESOURCE + "/{id}")
    public void delete(@PathParam("id") Long orderId);
  }

  static {
    RegisterBuiltin.register(ResteasyProviderFactory.getInstance());
    // Execution interceptor registration
    ResteasyProviderFactory.getInstance().registerProvider(ExecutionInterceptor.class);
  }

  private final ClientExecutor clientExecutor;
  private final RestEasyIF delegate;

  public ResteasyProxiedOrdersClient(String baseUri) {
  ...
    clientExecutor = new ApacheHttpClient4Executor(helper.getHttpClient());
    delegate = ProxyFactory.create(RestEasyIF.class, baseUri, clientExecutor);
  }

  public OrderDto create(OrderDto dto) throws OrderValidationException, OrderException {
    try {
      return delegate.create(dto);
    }
    catch (ClientResponseFailure failure) {
      if (failure.getResponse().getStatus() == Status.BAD_REQUEST.getStatusCode()) {
        throw new OrderValidationException(failure.getMessage());
      }
      throw new OrderException(failure.getMessage());
    }
  }

  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    try {
      return delegate.get(orderId);
    }
    catch (ClientResponseFailure e) {
      if (e.getResponse().getStatus() == Status.NOT_FOUND.getStatusCode()) {
        throw new OrderNotFoundException("Order Not found");
    }
    throw new OrderException(e.getMessage());
  }
 }
}

One of the gripes that I had while working with the version of the RESTEasy Proxy is I could not understand why I had to provide an @Consumes annotation on the interface for an update operation that returns "void" and is a PUT HTTP method. The Cxf client did not have that requirement and I feel it redundant to have to specify the same.
Upon failure of an invocation, a ClientResponseFailure is thrown which can then be interrogated to throw any custom exception you desire. I must admit, one thing I have not tested is whether on not, upon receiving a ClientResponseFailure if the response body is not read, will the underlying HttpClient connection be safely released by RESTEasy proxy code for re-use or is there a potential to leak a connection? This is worth investigating if looking at the same.

b. HTTP Centric or Manual Client Request API:

public class ResteasyOrdersClient implements OrdersClientIF {
  ...
  private final ClientExecutor clientExecutor;

  static {
    RegisterBuiltin.register(ResteasyProviderFactory.getInstance());
    ResteasyProviderFactory.getInstance().registerProvider(ExecutionInterceptor.class);
  }

  public ResteasyOrdersClient(String baseUri) {
    ..
    helper = new HttpClientFourHelper();
    clientExecutor = new ApacheHttpClient4Executor(helper.getHttpClient());
  }

  public OrderDto create(OrderDto dto) throws OrderValidationException, OrderException {
    ClientResponse response = null;
    try {
      ClientRequest request = new ClientRequest(ORDERS_URI, clientExecutor);
      response = request.body(MediaType.APPLICATION_XML, dto).post();

      if (response.getStatus() 
            == javax.ws.rs.core.Response.Status.BAD_REQUEST.getStatusCode())  {
        throw new OrderValidationException(response.getEntity(String.class));
      }
      return response.getEntity(OrderDto.class);
    }
    catch (OrderValidationException e) {
      throw e;
    }
    catch (Exception e) {
      throw new OrderException(e.getMessage());
    }
    finally {
      // Safe release of connection/stream
     if (response != null) {
       response.releaseConnection();
     }
    }
  }
  ...
  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    ClientRequest request = new ClientRequest(ORDERS_URI + "/{id}",
     clientExecutor).pathParameter("id", orderId);
    ClientResponse response = null;
    try {
      response = request.accept(MediaType.APPLICATION_XML).get();
      if (response.getStatus() 
           == javax.ws.rs.core.Response.Status.NOT_FOUND.getStatusCode()) {
        throw new OrderNotFoundException("Order Not found");
      }
      return response.getEntity(OrderDto.class);
    }
    catch (OrderNotFoundException e) {
      throw e;
    }
    catch (Exception e) {
      throw new OrderException(e.getMessage());
    }
    finally {
      if (response != null) {
        response.releaseConnection();
      }
    }
 }
....
}

With the manual API, one has considerable control on the request object such as setting headers etc. One difference to note that unlike in the case of the Proxy Client, in the event of a failure response, a ClientResponseFailure exception is not thrown. One would need to explicitly work with the ClientResponse object to discern the same and throw any custom exceptions desired. With the manual client, one has the ability to explicitly release the down stream connection to prevent leaks.

3. Restlet:

Restlet is one of the earliest frameworks for RESTful services ever developed if not the earliest. It has a very mature API and implementation and provides multiple ways of working with RESTful services. There is the core API which pre-dates the JAX-RS specification while also having a JAX-RS implementation. They have a mature client API that with their upcoming 2.0 release will utilize HttpClient 4.0. The control over the HTTP Client is a bit hard to get to as the API tends to hide the same. However, their API does allow for most HTTP Client control one would imagine. Again, there are two ways in which one can work with the Restlet client framework, either using Proxy Client or via direct HTTP centric API. Restlet documentation can be viewed at http://www.restlet.org/documentation/2.0/tutorial

a. Proxy Based:

public class RestletProxiedOrdersClient implements OrdersClientIF {
  // Proxy interface
  public static interface OrdersResource {
    @Get
    public OrderDto getOrder();

    @Post
    public OrderDto create(OrderDto dto);

    @Put
    public void update(OrderDto dto);

    @Delete
    public void delete();
  }
  ....

  public OrderDto create(OrderDto orderDto) 
    throws OrderValidationException, OrderException {  
    try {
      ClientResource cr = new ClientResource(ORDERS_URI);
      OrdersResource res = cr.wrap(OrdersResource.class);

      OrderDto result = res.create(orderDto);
      return result;
    }
    catch (ResourceException e) {
      if (e.getStatus().equals(Status.CLIENT_ERROR_BAD_REQUEST)) {
        throw new OrderValidationException(e.getMessage());
      }
      throw new OrderException("Unexpected Error:" + e.getStatus());
    }
  }
  ...
  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    ClientResource cr = new ClientResource(ORDERS_URI + "/" + orderId);
    OrdersResource res = cr.wrap(OrdersResource.class);
    try {
      return res.getOrder();
    }
    catch (ResourceException e) {
      if (e.getStatus().equals(Status.CLIENT_ERROR_NOT_FOUND)) {
        throw new OrderNotFoundException(e.getMessage());
      }
      throw new OrderException("Unexpected Error:" + e.getStatus());
   }
 }
..
}

Exception management is handled by catching a ResourceException and throwing any custom exception desired. One area that requires further investigation is how to safely release a HTTP Connection when using HTTP Client as the underlying transport and ClientResource with the Proxy. Again, this is an area one would need to dig into if control of HTTP Client parameters is required along with safe release of pooled connections. For more information on the same look at http://n2.nabble.com/Client-Timeout-on-ClientResource-post-method-td3690842.html.

b. HTTP Centric or Manual Client Request API:

public class RestletOrdersClient implements OrdersClientIF {
  private final Client client;
  ....
  public RestletOrdersClient(String baseUri) {
    client = new Client(new Context(), Protocol.HTTP);
    ..
  }

  public OrderDto create(OrderDto orderDto) throws OrderValidationException, OrderException {
    Response response = null;

    try {
      response = client.post(ORDERS_URI, new JaxbRepresentation<OrderDto>(orderDto));

      if (response.getStatus().isSuccess()) {
        return new JaxbRepresentation<OrderDto>(response.getEntity(), OrderDto.class).getObject();
      }
      else if (response.getStatus().equals(Status.CLIENT_ERROR_BAD_REQUEST)) {
        throw new OrderValidationException("Error validating order");
      }
      else {
        throw new OrderException("Error processing order:" + response.getStatus());
      }
    }
    catch (OrderValidationException e) {
      throw e;
    }
    catch (IOException e) {
      throw new OrderException("Unexpected:" + e);
    }
    finally {
      // Explicit safe release of response
      if (response != null) {
        response.release();
      }
    }
  }

  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    Response response = null;

    try {
      response = client.get(ORDERS_URI + "/" + orderId);
      if (response.getStatus().isSuccess()) {
        return new JaxbRepresentation<OrderDto>(response.getEntity(), OrderDto.class).getObject();
      }
      else if (response.getStatus().equals(Status.CLIENT_ERROR_NOT_FOUND)) {
        throw new OrderNotFoundException("Order Not Found");
      }
        throw new OrderException("Unknown error processing order:" + response.getStatus());
    }
    catch (IOException e) {
      throw new OrderException(e.getMessage());
    }
    finally {
      if (response != null) {
        response.release();
      }
    }
  }
}

The direct Client API is very straight forward as well. With the safe release of Http Connections accounted for and control of the HTTP Client parameters, the Restlet client is a very powerful proven client side implementation.

4. Apache Wink:

Apache Wink is a complete implementation of the JAX-RS specification while providing a client side API to communicate with RESTful services. The framework is relatively new with 1.0-incubating version available at the time of this blog. Apache Wink allows you to work with Http Client and thus control all the lower level operations easily. Unlike the above mentioned frameworks, there is currently no Proxy based client support for Apache Wink. That said, their client API flows very well and is easy to understand and use. An implementation of their ClientHandler interface allows one to easily intercept requests for custom header or security while also providing an avenue to throw custom exceptions based of alternate failure responses. Documentation on the Apache Wink client can be viewed at http://incubator.apache.org/wink/1.0/html/6%20Apache%20Wink%20Client.html

public class ApacheWinkOrdersClient implements OrdersClientIF {
  private final RestClient restClient;
  ...
  public ApacheWinkOrdersClient(String baseUri) {
    ClientConfig config = new ApacheHttpClientConfig(helper.getHttpClient());
    // Exception handler can also be used as an intercepting filter
    config.handlers(new ExceptionHandler());
    restClient = new RestClient(config);   
  }

  public OrderDto create(OrderDto dto) throws OrderValidationException, OrderException {
    try {
      return restClient.resource(UriBuilder.fromUri(baseUri)
            .path(ORDERS_RESOURCE).build()).contentType(MediaType.APPLICATION_XML)
            .accept(MediaType.APPLICATION_XML).post(OrderDto.class, dto);
    }
    catch (ClientRuntimeException e) {
      if (e.getCause() instanceof OrderValidationException) {
         throw ((OrderValidationException) e.getCause());
      }
      else if (e.getCause() instanceof OrderException) {
         throw ((OrderException) e.getCause());
      }
      throw e;
    }   
  }
  ...
  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    try {
      return restClient.resource(UriBuilder.fromUri(baseUri)
        .path(ORDERS_RESOURCE).path("{id}").build(orderId)).accept(MediaType.APPLICATION_XML)
        .get(OrderDto.class);
    }
    catch (ClientRuntimeException e) {
      if (e.getCause() instanceof OrderNotFoundException) {
        throw ((OrderNotFoundException) e.getCause());
      }
     else if (e.getCause() instanceof OrderException) {
        throw ((OrderException) e.getCause());
     }
     throw e;
   }   
  }
  ....
  private static final class ExceptionHandler implements ClientHandler {
    public ClientResponse handle(ClientRequest request, HandlerContext context) throws Exception {
      // Filter for example for standard headers
      request.getHeaders().add("foo", "bar");

      ClientResponse cr = context.doChain(request);
      if (cr.getStatusCode() == Status.NOT_FOUND.getStatusCode()) {
        throw new OrderNotFoundException(cr.getMessage());
      }
      else if (cr.getStatusCode() == Status.BAD_REQUEST.getStatusCode()) {
        throw new OrderValidationException(cr.getMessage());
      }
      else if (cr.getStatusCode() == Status.SERVICE_UNAVAILABLE.getStatusCode()) {
        throw new OrderException(cr.getMessage());
      }
      return cr;
   }
 }
}

5. Jersey:

Jersey is sun's implementation of the JAX-RS specification. Jersey like other frameworks provides for a client side framework as well. Jersey supports Apache HTTP Client via a totally separate implementation and artifact. Currently the support exists for HttpClient 3.X, whether 4.X will be incorporated is up in the air. One can choose to write custom code to do the same if Http Client 4.X is the direction one wishes to employ. Jersey does not have the concept of Proxy clients currently. However, their API flows very well with their standard client. Their use of Filters on the client side enables easy interception of requests for customization while providing safe release of any connections used. Information on jersey and their client API can be found at https://jersey.dev.java.net/
public class JerseyOrdersClient implements OrdersClientIF {
  ....
  public JerseyOrdersClient(String baseUri) {
    ..
    ApacheHttpClientHandler handler = new ApacheHttpClientHandler(helper.getHttpClient());
    client = new ApacheHttpClient(handler);
    // Filter allows for intercepting request
    client.addFilter(new RequestFilter());
  }
  ...
  public OrderDto create(OrderDto dto) throws OrderValidationException, OrderException {
    ClientResponse response = null;
    try {
      response = client.resource(baseUri).path(ORDERS_RESOURCE).entity(dto,
        MediaType.APPLICATION_XML).post(ClientResponse.class);
      throwExceptionIfNecessary(response);
      return response.getEntity(OrderDto.class);
    }
    finally {
      if (response != null) {
        response.close();
      }
    }
  }
  ...
  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    ClientResponse response = null;
    try {
      response = client.resource(baseUri).path(ORDERS_RESOURCE)
       .path(String.valueOf(orderId)).accept(MediaType.APPLICATION_XML)
       .get(ClientResponse.class);

      if (response.getStatus() == Status.OK.getStatusCode()) {
        return response.getEntity(OrderDto.class);
      }
      else if (response.getStatus() == Status.NOT_FOUND.getStatusCode()) {
        throw new OrderNotFoundException(response.getEntity(String.class));
      }
      else if (response.getStatus() == Status.SERVICE_UNAVAILABLE.getStatusCode()) {
        throw new OrderException(response.getEntity(String.class));
      }
      throw new OrderException("Unexpected");
   }
   finally {
     if (response != null) {
       response.close();
     }
   }
 }
 ....
 private static final class RequestFilter extends ClientFilter {
   public ClientResponse handle(ClientRequest cr) throws ClientHandlerException {
      MultivaluedMap<String, Object> map = cr.getHeaders();
      map.add("foo", "bar");
      return getNext().handle(cr);
   }   
 }
}

6. Spring RestTemplate:

Aah, what do I say. My favorite framework in the whole wide world is now supporting REST with their 3.X release. Like the popular HibernateTemplate, JdbcTemplate we now have RestTemplate with Spring. Spring supports server side JAX-RS and a client API to consume the service with the RestTemplate. RestTemplate can be configured to work with HttpClient 3.X and thus have control over lower level HTTP parameters and pooling pretty easily. The RestTemplate like other thoughtful Spring implementations is based of callbacks where safe release of resources is important. The RestTemplate has simple methods for commonly used HTTP operations while providing the call back mechanism when one desires more control. I must however mention with great restraint that going the call back route is not an easy task and requires considerable customization. This becomes important if you wish to customize the header properties etc. Error handling is easily accomplished with an extention of ResponseErrorHandler. Further information on the RestTemplate can be viewed at the following location http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/remoting.html#rest-client-access

public class SpringOrdersClient implements OrdersClientIF {
  private final RestTemplate template;
  ....
  public SpringOrdersClient(String baseUri) {
    ...
    ClientHttpRequestFactory requestFactory
       = new CommonsClientHttpRequestFactory(helper.getHttpClient());
    template = new RestTemplate(requestFactory);
    // Set Error handler
    template.setErrorHandler(new ErrorHandler());
  }

  public OrderDto create(OrderDto orderDto) throws OrderValidationException, OrderException {
    return template.postForObject(ORDERS_URI, orderDto, OrderDto.class);      
  }

  public OrderDto get(Long orderId) throws OrderNotFoundException, OrderException {
    return template.getForObject(ORDERS_URI + "/{id}", OrderDto.class,
        Collections.singletonMap("id", String.valueOf(orderId)));
  }
  ....
  private static final class ErrorHandler implements ResponseErrorHandler {
    @Override
    public boolean hasError(ClientHttpResponse response) throws IOException {
      if (response.getStatusCode().series().compareTo(Series.SUCCESSFUL) == 0) {
        return false;
      }
      return true;
    }

    public void handleError(ClientHttpResponse response) throws IOException {
       if (response.getStatusCode().equals(HttpStatus.NOT_FOUND)) {
          throw new OrderNotFoundException("Order Not Found");
       } else if (response.getStatusCode().equals(HttpStatus.BAD_REQUEST)) {
          throw new OrderValidationException("Error validating order");
       } else {
          throw new OrderException("Unexpected error:" + response.getStatusText());
       }
    }  
  }
}

Running the Examples:

The example is available as JDK 6.X project that uses maven. An integration test is provided that uses each of the different mentioned clients to communicate the very same life cycle test for CRUD operations with the RESTful service. Once you have a maven environment, execute "mvn install" from the root level of the project to see the entire project build and execute the integration test where each of the clients are demonstrated. For those interested, you could use the clients to determine benchmarks, test for memory foot print, what have you.
Download the source from HERE.

Parting Thoughts:
I hope the above examples are of use to someone evaluating REST frameworks for their project and would like to witness each of them in action. I have been a REST fanatic ever since introduced to the architectural style and simply enjoy playing with frameworks and tools that present themselves. It would have been nice if during jsr 311 the expert body had defined a client side api as well. I beleive a future JSR is expected for a client API to JAX-RS.
One can details about each of the above mentioned projects and metrics about the same such as commiters, activity, maturing etc from Oh Loh. That said, I would like to share my 2c on the above frameworks. Note that these are my own 2c that I am pulling out of my b..t and is not in any way my employers views or thoughts on the matter.

Apache Cxf:
The one word that comes to mind when I view this framework is BLOAT. It seems to bring in so many dependencies that are of no use if ones goal is simply to work with REST. Their documentation on the REST clients API was not the best with broken links. It is my understanding that they do not support Apache HTTP Client and there is no drive to do the same. If a team is supporting JAX-WS and JAX-RS, maybe their framework works in synergy. You will notice that among all the frameworks mentioned above, I never bothered to support an example of CXF, the primary reason for the same is my disillusionment with their documentation and WIKI where I encountered broken links and partial information. I also ran into an issue where I needed to have HttpServletRequest on the client as a dependency.

RestEasy:
In RestEasy, you have me as a fan. I really like the effort put into the framework by its owner Bill Burke and his team. Their proxy solution is very enticing and will serve to address most RESTful consumers. I might be pipe dreaming here but never the less, I recall reading somewhere that the RESTEasy's implementation of the client library will be serving as the foundation of the client side equivalent of JSR 311. Active development, with support for ASYNC HTTP along with good documentation is what they bring to the table. It is an especially notable consideration to use RESTEasy if re-use across service and client for "contract" purposes is desired. One additional point of note is the philosphy that RestEasy employs regarding backward compatibilty. They seem very concious regarding the direction they employ with change to ensure backward compatibility.

Restlet:
Restlet is an established and proven framework. One of the things I particularly like about Restlet is that their community is very active in offering assitance to those in need. If you post a question on their mailing list, you can almost be guaranteed a response as long as the question is within answerable parameters. They are very quality concious as well. One gripe that I do have have with Restlet is that they have chosen to break compatiblity between their 1.1.X series of releases when moving to their 2.0.X series without a transitionary phase.
The selling point of Restlet as a client API is their transparent API and simplicitly, coupled with their helpful community.

Apache Wink:
YAJAXRS (Yet another JAX-RS implementation) is my initial reaction when thinking of Apache Wink. However, when I look further, although comparitively immature in the space, they have a solid offering in WINK. If you are looking for a light weight JAX-RS implementation, look no further. Their client API utilizes Http Client 4.0 to effect and in my tests with the API, it found it really fast and performant. Their API is simple, transparent and effective. Their documentation is however sparse and I wonder regarding their longevity when compared with the big hitters such as jersey, restlet and resteasy.

jersey:
jersey, as I said in a previous blog, home town of the boss, rocks. What appeals to me is the simplicity of the API, the adoption, the community and decent documentation regarding the API. Their support for Http Client 3.X is present. I am certain they will support 4.X soon.

spring:
Its hard to say anything about spring without seeming biased in favor. Spring's RestTemplate is as solid as can be expected. Their standard call back based approach for safe release of resources works well even for the REST template. If one is using Spring-mvc and their REST support, very few reasons would drive me to consider an alternative framework for the client. One among them is definitely finer grained control, another is Http Client 4.X support. Documentation is sparse on RestTemplate as well. But one has a community to back up on. There might a bit of up front customizations, standard call back etc that an adopter might create but once done I feel that it would be a easy to work with the RestTemplate.
Clearly one has many choices in selecting a client side framework for RESTful HTTP. In most cases it probably makes sense to use the same framework for the service and client end. Then again, if you are only a consumer of a service you have multiple choices among those shown above as well as the option of using Apache HTTP Client or Components directly and bypassing the higher level frameworks. For some, integrating with the spring framework is important and all the above frameworks have means of integration points, both on the service and client sides. Support for Client Proxies is something one might want to consider as they tend to simplify the programming model. Further if Resource definitions can be shared among client server, that can be quite useful in being DRY (Don't repeat yourself) and provide means for contract definition. For those interested in performance and tuning of the HTTP Connections, using a framework that allows you to manage connection pooling and other parameters is definitely the way to go. One should also look at the maturity, user base, support, back ward compatibility support when making a selection. Are there other options apart from the above mentioned? In addition, any recommendations based of personal experience with the above mentioned client frameworks is always welcomed.