Search This Blog

Loading...

Wednesday, October 29, 2008

When you have eliminated the impossible.....

A mystery is afoot:
In my illustrious (please don't snigger! :-) career as a software developer, I have found some very interesting mysteries, some Mr.Sherlock Holmes would have been intrigued by, for a few fleeting moments at least :-). I have been able to solve many of these mysteries primarily due to assistance from most esteemed colleagues. Software development sometimes brings forth mysteries and I live for them as those are the best highs, just to be a part of the experience, regardless of who solves them...

Anyway, I recently came across a similar experience that you might want to play along:

Facts:
There is a software library, let us call it library X. Library X has undergone some enhancements at my hand and is now library X.X. There is a consumer of this library, a web application. The Web Application has been using library version X and has now upgraded to version X.X. There is load test environment available where the web application is going to be deployed to test the sanity of the build.

Ensuing Events:
  1. The Mystery: The consumer has upgraded to use version X.X. When deployed to a load test environment, all hell breaks lose and the victim (consumer) is found dead. The immediate suspect is version X.X of the library as it was the only change. Clearly the direction to isolate the problem to Version X.X of the library is to downgrade the library to the older version, i.e., version X and run the load test. If failure occurs, X.X cannot be blamed, if failure does not occur, X.X is the culprit. Load test is run again after the downgrade only to find the consumer dead again. So a conclusion might be drawn that the upgrade to version X.X is not the cause of death but there is some inherent problem. Version X.X of the library has been temporarily vindicated while investigation proceeds....
  2. An Investigation: After considerable investigation it is proven that there is a subtle bug that manifests itself depending on garbage collection cycles and their timing therein. The bug would surface regardless of whether version X or X.X is used.
  3. A Break at last: A fix to the problem is devised, hereby referred to as the WEBAPP_FIX which involves a change to the consumer (web app) in the way the library is being used.
  4. Testing the Theory: Since version X.X has been vindicated of the crime, a BUILD that is composed of library X.X with the WEBAPP_FIX should not result in any fatalities if the WEBAPP_FIX was correct. Load tests are run with the new build of X.X + WEBAPP_FIX. Oh No! We have a fatality again, and the post mortem proves that the symptoms are the same as before. So, logical reasoning is "WEBAPP_FIX did not fix the problem" or "There is something else in version X.X causing the problem?". Lets try the load test again with a new BUILD that consist of the old library, i.e., library X + WEBAPP_FIX. Viola we do not have a fatality....its repeatable, we have survival...
  5. Immediate conclusion: As the WEBAPP_FIX was good with a BUILD that composed of library X + WEBAPP_FIX, there is something else lurking or introduced in version X.X of the library that repeats the problem.

Interrogating of Version X.X with the WEBAPP_FIX:

What have we to go on at this point?
  • BUILD 1: WEBAPP + LIBRARY X + WEBAPP_FIX = SUCCESS
  • BUILD 2: WEBAPP + (LIBRARY X + DELTA X) + WEBAPP_FIX = FAILURE

Note that Libary X.X = LIBRARY X + DELTA X

In other words, removing the common element of WEBAPP_FIX and LIBRARY X from the above simultaneous equations of Build 1 and Build 2, we can surmise that DELTA X is the cause for failure.

I spend hours trying to re-create the problem with DELTA X. Cannot! Feel like a failure. Why is it that I cannot create a failure with WEBAPP + X.X + WEBAPP_FIX when the load test mentioned the same. I slowly eliminate factors...I cannot get the suspect to commit the murder!!!! Get reminded of Inspector Lestrade of Scotland yard who is often chasing a wrong lead. So the question remains "Why did the load test with the build of WEBAPP + X.X + WEBAPP_FIX fail?" The words of Mr.Sherlock Holmes is echoing in my brain, "When you have eliminated the impossible, whatever remains however improbable must be the truth?" Have I truly eleminated the impossible? Looks like it, everything else is the same so what is the improbable????

Finding the improbable and thus a conclusion:

There was nothing wrong with version X.X. There was a problem with the BUILD that was load tested that used version X.X as it did NOT have the FIX. Bad build!!!! This was my improbable. How could the build not be correct? This is a basic assumption that I had, the build is correct. I simply assumed it for a fact that that:

BUILD2 = WEBAPP + (LIBRARY X + DELTA X) + WEBAPP_FIX

when in fact it really was:

BUILD2 = WEBAPP + (LIBRARY X + DELTA X)

In other words the build created for the load test missed the WEBAPP_FIX!

A basic mistake in one of the simplest causes of the failure led me researching more complicated ideas and theories. I am ashamed of myself for believing somethings to be facts at face value. If only I had verified my FACTS before heading on my quest, I would have solved the mystery a long time ago! One of my lowest moments I must admit. I will however not let it get me down...one cannot have total victories all the time.

So here I am with my famous cap on, rain drizzling over 221B Baker Street, my blood shot with cocaine to stimulate my brain while I await the next mystery....All of a sudden, I hear Mrs.Hudson say, "Mr.Homes there is someone here to see you.."...:-) My eyes light up, the violin is thrown on the dusty sofa, I wear my coat to hide my crumpled shirt, take a seat, light up my pipe and await the entrance of my new client...

1 comments:

Sai said...

A very nice write up sanjay and I know exactly what you are talking about ;)