Search This Blog

Tuesday, July 8, 2008

Java - equals(), hashCode(), Object Identity, A HACK solution....

Introduction:
Time to provide a note to myself again. I am using this blog to log stuff that act as notes for me. I hate searching for some stuff on the Internet. There are times when I am involved with re-learning and this is one of those times :-).

Wise men of java say, "When you override the equals() method in java, you must also override hashCode()"..Why is that so? I found an nice blog/article on the same to explain better than I ever can.

In short, whenever an object is added to a HashSet, its hashCode() method is interrogated. The value obtained there is used in a computation to determine in which bucket to place the object in. Many objects could hash to the same bucket. The way an object is found in the HashSet is to first hash to the bucket and then run through the elements in the bucket with a comparison of identity or equals. I hope I am right on this ;-)

What I am hoping to explore is equals and hashCode from a persistence perspective, i.e., when using persistent/persistable entities.

Exploration of Problem:

Let us consider a problem domain for the sake of discussion. Our problem domains involves Organizations and Applications there in. Every Organization has one or more Applications. In our domain, an Application is unique by virtue of its name. The underlying data model will not allow two applications with the same name. Additionally, as an Organization, we choose to use surrogate id's and not natural ids for primary keys. In other words, the name attribute of an Application object will not be the pk but an alternate key. Additionally, the name of a persisted application could be changed over time.




public class Application {
private Integer id;
private String name;

public Application() {}

public Integer getId() { return id; }

public void setId(Integer id) { this.id = id;}

public String getName() { return name; }

public void setName(String name) { this.name = name; }
}


The above implementation does not override equals() or hashCode(). The following are some unit-tests against the above class.




  1  @Test public void testIdentity() throws Exception {
2 HashSet<Application> appSet = new HashSet<Application>();
3
4 Application app = new Application();
5 Application appOther = new Application();
6
7 appSet.add(app);
8
9 assertTrue("Set must contain inserted application", appSet.contains(app));
10 assertFalse("Set must not contain other app as it should be using identity for equals", appSet
11 .contains(appOther));
12 assertFalse("Should not be equal", app.equals(appOther));
13 assertFalse("Should not have the same hashcodes as different objects",
14 app.hashCode() == appOther.hashCode());
15
16 app.setId(new Integer(2));
17 appOther.setId(new Integer(2));
18
19 assertTrue("Object should still be obtainable from the due to identity", appSet.contains(app));
20 assertFalse("Set must not contain other app as it should be using identity for equals", appSet
21 .contains(appOther));
22 }


From the above example, two objects are neither equal nor share the same hashCode even though they appear to be the same, i.e., don't have any properties set. Even if the properties are set, they still appear un-equal as equals() and hashCode() are different.

Lets now look at a similar class where equals() has been overridden using the object's id property but the hashCode() method has not been overridden.




  1   @Override public boolean equals(Object otherApp) {
2 if (!(otherApp instanceof AppWithEqualsImpl)) { return false;
3
4 if (this == otherApp) { return true; }
5
6 if (this.getClass() != otherApp.getClass()) { return false;}
7
8 AppWithEqualsImpl other = (AppWithEqualsImpl) otherApp;
9
10 if (id == null) {
11 if (other.id != null)
12 return false;
13 }
14 else if (!id.equals(other.id))
15 return false;
16
17 return true;
18 }


Some tests with class:



  1   @Test public void testOnlyEqualsImpl() throws Exception {
2 HashSet<AppWithEqualsImpl> appSet = new HashSet<AppWithEqualsImpl>();
3
4 AppWithEqualsImpl app = new AppWithEqualsImpl();
5 AppWithEqualsImpl appOther = new AppWithEqualsImpl();
6
7 appSet.add(app);
8
9 assertTrue("App and other App are equal", app.equals(appOther));
10 assertTrue("App and other App don't have same hashcode", app.hashCode() != appOther.hashCode());
11 assertTrue("Set must contain inserted application", appSet.contains(app));
12 assertFalse(
13 "Set will not contain other instance as although equal, they have different hashCodes",
14 appSet.contains(appOther));
15
16 app.setId(new Integer(10));
17 assertTrue("Set should contain app as equals has changed but not hashcode", appSet
18 .contains(app));
19
20 appOther.setId(new Integer(10));
21 assertFalse(
22 "Set will not contain other instance as although equal in Id, they still have different hashCodes",
23 appSet.contains(appOther));
24 }


The above example demonstrates, that although both the objects are identical, as they have different hash Codes, the "appOther" object will fail the contains() test on the HashSet. The object added to the set is located as it matched both on identity and hashCode. After changing the "app" objects, Id field, it can still be found in the HashSet as the hashCode of the object has not altered and is still hashing to the same bucket. However, the problem to note here is that although "otherApp" is equal to the app in the Set, the Set considers "appOther" as a totally different object, thus breaking set semantics (i.e., duplicates) if "appOther" is inserted.

So we need to implement hashCode so that both "app" and "appOther" hash to the same bucket. Lets take a look at a variant that does exactly that:




  1 @Override public int hashCode() {
2 final int prime = 31;
3 int result = 1;
4 result = prime * result + ((id == null)
5 ? 0
6 : id.hashCode());
7 return result;
8 }
9
10 @Override public boolean equals(Object obj) {
11 if (!(obj instanceof AppWithEqualsHashCodeImpl)) { return false;}
12
13 if (this == obj) { return true;}
14
15 if (this.getClass() != obj.getClass()) { return false;}
16
17 final AppWithEqualsHashCodeImpl other = (AppWithEqualsHashCodeImpl) obj;
18 if (id == null) {
19 if (other.id != null)
20 return false;
21 }
22 else if (!id.equals(other.id))
23 return false;
24
25 return true;
26 }


A few tests based of the above class:




  1 @Test public void testEqualsAndHashCodeImpl() {
2 HashSet<AppWithEqualsHashCodeImpl> appSet = new HashSet<AppWithEqualsHashCodeImpl>();
3
4 AppWithEqualsHashCodeImpl app = new AppWithEqualsHashCodeImpl();
5 AppWithEqualsHashCodeImpl appOther = new AppWithEqualsHashCodeImpl();
6
7 appSet.add(app);
8
9 assertTrue("Set must contain inserted application", appSet.contains(app));
10 assertTrue("Set must return a match for contains of appOther as equals/hashCode are same now",
11 appSet.contains(appOther));
12
13 app.setId(new Integer(10));
14
15 assertEquals(app, appSet.iterator().next());
16 assertTrue(app.hashCode() == appSet.iterator().next().hashCode());
17
18 // Note the below
19 assertFalse(
20 "Set contains() should return false when checked for inserted app as hashcode has now changed."
21 + "Contains will check agaisnt a new bucket based of the new hashcode.", appSet
22 .contains(app)); // This means adding app back to the collection will have two elements.
23
24 appSet.add(app);
25 assertEquals(2, appSet.size());
26 }


In the above tests, as the object override's equal and hashCode, when "appOther" is considered the same object by the Set, thus preserving Set semantics. All good, however, look at the lines from 13-25 where we change a property that is participating in the hashCode. When the original added object is checked agaisnt the collection to see if the collection contains it, the collection reports back as false. Whatever is happening???

The problem is that as we changed a property of the object that participates in the hashCode calculation, we have effectively changed the hashCode of the object. When contains() is invoked, it tries to locate the object using the new hashCode. However, the object is present in the Set based of the old hash code and therefore cannot be located. What a pain? The first question we ask is can't we say appSet.rehash() so that the inserted object's hashCode is re-invoked and placed in the correct bucket? The API does not support RE-hashing. And probabaly rightfully so.

Is there anyway, we can overcome this problem? There are multiple solutions. One common path is to not include mutable properties as part of the hashCode() implementation and instead use a an alternate key or business key of the object for the same. In our domain, we know that an Application is uniquely identified by its name, so we can use that for the hashCode computation as shown below:




  1 public class AppWithBizKeyEquals {
2 private Integer id;
3
4 // This is a part of the business Key
5 private final String name;
6
7 public AppWithBizKeyEquals(String name) {
8 this.name = name;
9 }
10 // No setter for name. However there are setter's for id
11 .....
12 .....
13 @Override public int hashCode() {
14 final int prime = 31;
15 int result = 1;
16 result = prime * result + ((name == null)
17 ? 0
18 : name.hashCode());
19 return result;
20 }
21
22 @Override public boolean equals(Object obj) {
23 if (this == obj) return true;
24 if (obj == null) return false;
25 if (getClass() != obj.getClass()) return false;
26 final AppWithBizKeyEquals other = (AppWithBizKeyEquals) obj;
27 if (name == null) {
28 if (other.name != null)
29 return false;
30 }
31 else if (!name.equals(other.name))
32 return false;
33 return true;
34 }


In the above class, the name field is what constituted the business key and is set as immutable. One cannot change the value of the name after object creation. We are still able to alter the "id" property of the object after creation. Lets take a look at some tests:




  1 @Test public void testBusinessKeyEqualHashCode() {
2 HashSet<AppWithBizKeyEquals> appSet = new HashSet<AppWithBizKeyEquals>();
3
4 AppWithBizKeyEquals app = new AppWithBizKeyEquals("Foo");
5 AppWithBizKeyEquals appOther = new AppWithBizKeyEquals("Foo");
6
7 appSet.add(app);
8
9 assertTrue("Both instances must be equal", app.equals(appOther));
10 assertTrue("Both instances must have same hashcode", app.hashCode() == appOther.hashCode());
11
12 assertTrue("Set must contain inserted application", appSet.contains(app));
13 assertTrue("Set must return a match for contains of appOther as equals/hashCode are same now",
14 appSet.contains(appOther));
15
16 app.setId(new Integer(10));
17
18 assertTrue("Both instances are still equal as Id is not part of biz equality", app
19 .equals(appOther));
20 assertTrue(
21 "Both instances must still have same hashcode as Id should not have changed hashCode", app
22 .hashCode() == appOther.hashCode());
23
24 assertTrue("Set must contain inserted application", appSet.contains(app));
25 assertTrue(
26 "Set must return a match for contains of appOther as equals/hashCode are unaffected by id change",
27 appSet.contains(appOther));
28 }


In the above example, notice that both "app" and "appOther" are constructed by providing the name of the application. Both their hashCode() and equals() match. Changing the "id" property of the "app" object has no effect on locating the object in the Set as the "id" property does not participate in the hashCode() calculation.

This works great. However, has some deficiencies:

a. We need to find a business key always. Sometimes, it might be the entire object.
b. Business key becomes immutable

In our example, we could identify the name of the Application as the business key. However, as per our requirement, the name of an Application can be changed. The works agains't the immutability constraints that we have imposed on our object. There are work around to the same but clearly this not ideal.

So, can we overcome this problem where changing of a property that participates in the hashCode calculation of an object still allows us to locate the object by querying the HashSet?

One direction we could have hoped to take is that before the "id" of the object is set, we remove the object from the Set and after the Id is set we re-insert it back. That would work. However, what if the object was part of another object like an Organization and saving the Organization object would implicitly save the children Application objects thus providing "id" for them? How can we intercept the same?




  1 Organization org = new Organization("Foo Org");
2 Application app = new Application();
3 app.setName("Foo");
4 org.getApplications().add(app);
5 persister.save(app);
6
7 assertTrue(org.getApplications().contains(app));


Clearly the above code would fail the validation based of the previous mentioned examples.

So is there no way we can accomplish this???

Well, there may be others. I however, thought of one really bad hack that would make this work. It requires some deviations but acheives the results. The solution is neither performant, not safe, not anything else and I am not patenting the same here!!! There go the $$$$ :-))))

So what is the hack? What if we could acheive the removal and addition of the Object into the Set implicitly thereby simulating a rehash of the object?

To assist with the hack, enter the design patterns such as Observer (Listener).

The Hack:
To aid with the hack, we start with an Annotation. The annotation, HashCodeParticipator, will be applied to any field in a model object that will participate in the calculation of the hashCode().



  1 @Retention(RetentionPolicy.RUNTIME) public @interface HashcodeParticipator {}

We will be utilizing the PropertyChange support of Java Beans. To aid with our solution, we define an event class that is an extension of java.beans.PropertyChangeEvent. This event will be fired when a Property that participates in the hashCode() is:

a. About to be changed
b. Changed




  1 public class HashCodeElementChangeEvent extends PropertyChangeEvent {
2 public static final int HASH_PARTICIPANT_WILL_CHANGE = 0;
3 public static final int HASH_PARTICIPANT_CHANGED = 1;
4
5 private final int eventType;
6
7 public HashCodeElementChangeEvent(Object source, String propertyName, int eventType) {
8 super(source, propertyName, null, null);
9 if (eventType != HASH_PARTICIPANT_CHANGED && eventType != HASH_PARTICIPANT_WILL_CHANGE) {
10 throw new IllegalArgumentException("Invalid Event Type");
11 }
12 this.eventType = eventType;
13 }
14
15 public boolean isChangingNotificationEvent() {
16 return eventType == HASH_PARTICIPANT_WILL_CHANGE;
17 }
18 }


We also define an interface called PropertyChangeNotifier, that implementing objects will utilize to notify listeners when a property that participates in the hashCode() computation is undergoing change.




  1 public interface PropertyChangeNotifier {
2 public void addHashCodePropertyChangeListener(PropertyChangeListener propertyChangeListener);
3 public void removeHashCodePropertyChangeListener(PropertyChangeListener propertyChangeListener);
4 }


Ok, so we now have a notifier. Show me the listener. We define an extension of a HashSet that is a listener of these events. The extended HashSet will receive two types of events. The first when a property that participates in the source object is about to be mutated and a second event after the mutation. Upon receipt of the pre-mutation event, the Set will remove the source object from its collection and upon receiving the post mutation event, it will re-insert the same into its collection of maintained objects. Lets take a look at the listener Set:




  1 public class PropertyListenerHashSet<E extends PropertyChangeNotifier> extends HashSet<E> implements
2 PropertyChangeListener {
3
4 public PropertyListenerHashSet() {
5 super();
6 }
7 // Other constructors..
8 ....
9
10 /**
11 * Addition to the set involves registering the Set as a Listener
12 * of the Object being added.
13 *
14 * @param o Object to add to the Set.
15 */

16 @Override public boolean add(E o) {
17 boolean retVal = super.add(o);
18 // Add Listener
19 ((PropertyChangeNotifier) o).addHashCodePropertyChangeListener(this);
20
21 return retVal;
22 }
23
24 /**
25 * Removes specified object from the set.
26 * Part of the removal operation involves de-registering the set
27 * as a listener of the object.
28 */

29 @Override public boolean remove(Object o) {
30 removeListener(o);
31 return super.remove(o);
32 }
33
34 /**
35 * On A PropertyChangeEvent, if the event is a notification that
36 * a HashCode participating attribute is altering, then the object
37 * is removed from the collection. A subsequent property change event
38 * will ensure the object is added back to the collection.
39 */

40 public void propertyChange(PropertyChangeEvent evt) {
41 if (! (evt instanceof HashCodeElementChangeEvent)) {
42 return;
43 }
44
45 HashCodeElementChangeEvent event = (HashCodeElementChangeEvent) evt;
46
47 if (event.isChangingNotificationEvent()) {
48 super.remove(event.getSource());
49 } else {
50 // Note that the object already has this Set as a listener
51 super.add((E) evt.getSource());
52 }
53 }
54
55 @Override
56 public void clear() {
57 for (E item : this) {
58 ((PropertyChangeNotifier) item).removeHashCodePropertyChangeListener(this);
59 }
60 super.clear();
61 }
62
63 private void removeListener(Object obj) {
64 for (Iterator<E> i = iterator(); i.hasNext();) {
65 E item = i.next();
66 if (item.equals(obj)) {
67 ((PropertyChangeNotifier) item).removeHashCodePropertyChangeListener(this);
68 break;
69 }
70 }
71 }
72 }
73


The items to note in the above class are; that whenever an element is added, the Set regsiter's as a listener of the object and when removed its the opposite. Also note the behavior when the PropertyChangeEvent is received, i.e., the addition and removal of the source object.

Now, lets look at an implementation of the PropertyChangeNotifier interface. For the sake of ease, we define a base class called BO (Business Object for short) as shown below:




  1 public class BO implements PropertyChangeNotifier {
2 protected transient PropertyChangeSupport propertyChangeSupport = new PropertyChangeSupport(this);
3 private final Set<String> hashCodeProperties = new TreeSet<String>();
4
5 @SuppressWarnings("unchecked") public BO() {
6 // Doing so we will not have to listener for all property changes.
7 Class child = this.getClass();
8 Field[] fields = child.getDeclaredFields();
9
10 for (Field field : fields) {
11 // If field is Participating in hashCode by virtue of annotation
12 if (field.getAnnotation(HashcodeParticipator.class) != null) {
13 hashCodeProperties.add(field.getName());
14 }
15 }
16 }
17 ....
18 ....
19 // Register only to be notified on properties that are hash code participants
20 public void addHashCodePropertyChangeListener(PropertyChangeListener propertyChangeListener) {
21 for (String hashCodeParticipant : annotatedProperties) {
22 propertyChangeSupport.addPropertyChangeListener(hashCodeParticipant, propertyChangeListener);
23 }
24 }
25 .....
26 .....
27
28 // Fire an event when a hash code participant property is about to change
29 public void fireHashCodeParticipantChangingEvent(String hashCodeParticipant) {
30 propertyChangeSupport.firePropertyChange(new HashCodeElementChangeEvent(this, hashCodeParticipant,
31 HashCodeElementChangeEvent.HASH_PARTICIPANT_WILL_CHANGE));
32 }
33
34 // Fire an event when the hash code participant property has been mutated.
35 public void fireHashCodeParticipantChangedEvent(String hashCodeParticipant) {
36 propertyChangeSupport.firePropertyChange(new HashCodeElementChangeEvent(this, hashCodeParticipant,
37 HashCodeElementChangeEvent.HASH_PARTICIPANT_CHANGED));
38 }
39 }
40


Lets look an implementation of our SmartApplication object that extends the BO. The object uses only the "id" property for equals() and hashCode() implementations. I am not showing the entire class but only the methods of interest below:




  1   /**
2 * Note this should probabaly be aspect handled. 1. beforeMethod - If method contains
3 * HashCodeParticipator annotation fireGoingtoChangeEvent 2. executeMethod 3. afterMethod - If
4 * method contains HashCodeParticipator fireChangedEvent
5 *
6 * @param id
7 */

8 public void setId(Integer id) {
9 fireHashCodeParticipantChangingEvent("id");
10 this.id = id;
11 fireHashCodeParticipantChangedEvent("id");
12 }
13
14 public String getName() {
15 return name;
16 }
17
18 public void setName(String name) {
19 String origName = this.name;
20 this.name = name;
21 propertyChangeSupport.firePropertyChange("name", origName, name);
22 }
23


In the above methods of the Smart Application class, the setId() method first fires an event to listeners that a hashCode property will be changed and after changing the property, fires another event stating the property has been mutated. Although not shown, the "id" property has an annotation of @HashCodeParticipant. Also note that when the "name" property is set, a simple property change event is fired.

Lets take a look at some unit tests that utilize the hacked framework:




  1 @Test public void testSmartApplication() throws Exception {
2 HashSet<SmartApplication> appSet = new PropertyListenerHashSet<SmartApplication>();
3
4 SmartApplication app = new SmartApplication();
5
6 appSet.add(app);
7 assertEquals("Should have only one listner, i.e., the Set", 1, app.getListenerCount());
8 assertTrue("Set must contain inserted application", appSet.contains(app));
9
10 // Mutate a hashCode participating propertt
11 app.setId(new Integer(10));
12
13 // The set now returns true for the contains due to hacked re-hash
14 assertTrue(appSet.contains(app));
15 assertEquals(1, app.getListenerCount());
16
17 // Remove from Set
18 assertTrue(appSet.remove(app));
19 assertEquals(0, app.getListenerCount());
20
21 assertFalse(appSet.contains(app));
22 assertEquals(0, appSet.size());
23
24 appSet.add(app);
25 assertEquals(1, appSet.size());
26 assertEquals(1, app.getListenerCount());
27
28 // Should not cause a property change event
29 app.setName("Foo");
30 appSet.clear();
31 assertEquals(0, appSet.size());
32 assertEquals(0, app.getListenerCount());
33 }


In the above example, even after the "id" property is changed, i.e., a property that participates in the hashCode() computation, the original object can still be located in the HashSet :-))) We managed to implement a transparent re-hash hack!

So what next???

Will this work with Hibernate?
The hibernate FAQ on equals/hashCode states the following:

"Will this work?
HashSet set = new HashSet();
User u = new User();
set.add(u);
session.save(u);
assert(set.contains(u));"

The answer is no, it will not work, for all the reasons discussed above. However, if we utilize the hack-framework mentioned above, it will.

To illustrate the solution, we introduce the Organization object. The Organization object contains an instance of our extended HashSet of Application objects like so:




  1   @OneToMany(mappedBy="organization",cascade=CascadeType.ALL, fetch=FetchType.EAGER)
2 @JoinColumn(name="ORG_ID")
3 @Fetch(value=FetchMode.JOIN)
4 private Set<SmartApplication> applications = new PropertyListenerHashSet<SmartApplication>();


One important thing to note is that Hibernate assigns the properties using "Reflection" by default. If our SmartApplication object's "id" property is assigned using Reflection, we cannot fire a property change. Therefore we need to ensure that the "id" property is set using the setter method. The same is accomplished using the annotation @AccessType (value="property") as shown below:




  1   @AccessType(value="property")
2 @HashcodeParticipator
3 private Integer id;


Now the unit test:




  1 @Test
2 public void testPersistence() {
3 Session session = HibernateUtil.getSessionFactory().getCurrentSession();
4 Transaction tx = null;
5 try {
6 tx = session.beginTransaction();
7 Organization org = new Organization();
8 org.setName("Foo Company");
9
10 SmartApplication app = new SmartApplication();
11 app.setName("Auditing");
12
13 org.addApplication(app);
14
15 session.save(org);
16
17 // This is the acid test....
18 assertTrue(org.getApplications().contains(app));
19 tx.commit();
20 } catch (Exception e) {
21 e.printStackTrace();
22 tx.rollBack();
23 }
24 }


The above test passes :-))))))))))))))))))))))

Conclusion:

The above mentioned solution is a hack!!! Please do not implement the same. I was only curious as to how one could overcome the problem and let my insane mind play. The code/article has not had any reading or advocation. There are issues such as performance, synchronization, Hibernate Read etc, etc, etc to be investigated.

I am a brooder by nature where someone's comment on a topic sends me over-analysing the same. My thinking does not necessarily equate to productivity or a good solution. The blog was meant more as an education to myself and a singular place to understand/refer to equals/hashCode. I only hope the above helps someone treading the same path as myself understand the esoteric world of Hashing better :-))). If I am in error, I would appreciate direction.

I have attached herewith a Maven project with the different scenarios. As a parting note, when implementing equals/hashCode

1. Attempt to find a business equals/hashCode when applicable
2. Ensure that you check for immediate object identity via this == other.
3. Check for class equality. This is specially important when a subtype fails to override equals/hashcode and a super-type evaluates as being the same as the sub-type which does not make sense. For example every Car is not a Porsche ;-)
4. Use a mutable property for hashCode judiciously. Watch for Id's specially.
5. Any finally, override equals and hashCode if the object in concern will ever be used as a Key of a HashMap or as an item in a HashSet.
6. GUID - This is a different blog.
7. Objects that are part of equals do not have to be part of hashCode
8. Attempt to contact more prominent brains such as Gosling, Gaving King or Rod Johnson for assistance....;-)..kidding.

G.N...what a Rant!!!! This must take the cake :-)))))

No comments: