Intention Hidden In Implementation And Misty Edge of Validity

The code is the documentation” – except that from poorly written code you cannot tell what is an intrinsic part of the solution and what is an accidental implementation detail. And a piece of code can rarely handle any possible data – yet, without a good documentation or precondition checks, you can’t even guess at what are valid or unexpected inputs. Today we will explore a piece of JavaScript code that both hides the intention in the implementation (the “why” is not clear from the “what” the code does) and operates correctly only under particular but unstated conditions.

The function in question prepares data series for a plotting library so that differences (such as +10%) between the current and the previous unit of a time period can be shown (f.ex. when displaying daily totals for April, we want to see their differences from the corresponding days in March). (You have now got a little advantage of knowledge that a maintainer of the code would have to figure out herself.) The code follows; don’t be discouraged when you will feel you don’t really graps what the code does and why, I too have struggled.

A few things make the code harder to read and maintain:

  1. Unclear intention – no explanation of why this is done – why do we need all the current and previous series have the same number of elements and why we use an empty array for the elements that we add? Why do we replace a not-a-number (and what is it anyway, a missing data element?) with 0 and not something else?
  2. Unclear preconditions – no expression of the assumptions the code operates under, i.e. within which limits it is valid w.r.t. inputs and w.r.t. the use of the fake data added to the series
  3. Unclear inputs – lack of information about the structure of the inputs and their valid variations, which makes it hard to understand the code
  4. Duplication – several times we check for an array element and add it if missing; this could be generalized and the code thus shortened and simplified
  5. Lying comments – the initial comment at line 1 reads “add the delta to the data” – but that is not true, we are not adding the difference (e.g. +10%), we are adding the whole previous value (seriesData.previous[..] = previousValue); that’s one of disadvantges of comments – they tend to get out of sync, if they ever were correct

These are not some theoretical shortcomings. Due to them, I wasted over a day debugging our failing graphs and I still struggle to understand why the code is as is. I assume that it once worked (only rarely do people purposefully write broken code) but I can only guess at for which data and which graph types it actually worked correctly. A better structure and good comments would help me understand it faster/better and a bit of defensive programming would likely help me discover the cause hours earlier.

Here is an improved version (still far from perfect; it would be best to throw all of it away and write it properly – but that rarely works; only evolution in small steps is safe enough):

Improvements:

  1. The documentation now explains what and why we do (we may only hope that it is general enough not to get out of sync with the code anytime soon)
  2. We now explicitely check our assumptions about the data (assertMatchingDaysInCurrentAndPrevious) and will fail quickly if they do not hold
  3. Unimportant details have been moved to helper functions with helpful names so that the function’s body is shorter and much easier to read and understand
  4. Duplication has been removed (createDataPointIfMissing, replaceNaNWithDefaultOf)
  5. I’ve raised the level of abstraction in the for loop, stating clearly the intent (createDataPointIfMissing, replaceNaNWithDefaultOf, copyPreviousValueIfDefined), unobscured by the implementation details
  6. Bonus: As a side-effect, I have fixed a bug where we did not replace NaN in the current series, i.e. when the current series had less elements (e.g. days) than in the previous period, we added an [] which then later lead to erros; we should have likely added [0, 0]

Summary

  1. Write your code so that the intent and implementation are separated and the intent is clear (preferably using functions with meaningful names, since comments are hard to maintain)
  2. Express your expectations about the inputs, either in a comment or, preferably, by actually checking them (-> defensive programming)
  3. Create functions on different levels of abstraction, from high-level ones stating the intent and expressing the overall flow of logic to low-level implementation details ones (and do not mix functions of different level in the same code block)
  4. Test your code; tests help you make sure your code works as intended and can be a great documentation of how to use it, what are valid and invalid inputs

If you do this, the maintainer of your code will love you. If you won’t, s/he will waste hours/days trying to understand what is an important part of the code and what is only an implementation detail that can be changed and trying to extract meaning and intent out of the spaghetti monster.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s