Intention Hidden In Implementation And Misty Edge of Validity

The code is the documentation” – except that from poorly written code you cannot tell what is an intrinsic part of the solution and what is an accidental implementation detail. And a piece of code can rarely handle any possible data – yet, without a good documentation or precondition checks, you can’t even guess at what are valid or unexpected inputs. Today we will explore a piece of JavaScript code that both hides the intention in the implementation (the “why” is not clear from the “what” the code does) and operates correctly only under particular but unstated conditions.

The function in question prepares data series for a plotting library so that differences (such as +10%) between the current and the previous unit of a time period can be shown (f.ex. when displaying daily totals for April, we want to see their differences from the corresponding days in March). (You have now got a little advantage of knowledge that a maintainer of the code would have to figure out herself.) The code follows; don’t be discouraged when you will feel you don’t really graps what the code does and why, I too have struggled.


// compare two series and add the delta to the data for the current period
augmentSeriesWithPreviousValues:function (seriesData, previousSeriesData) {
var series = seriesData.series;
seriesData.previous = [];
var previousSeries = previousSeriesData.series;
// checks, computing maxRows, maxColumns, previousCategories
// that maps category name to series index (id)
for (var seriesId = 0; seriesId < series.length; seriesId++) {
var category = seriesData.category[seriesId + 1];
var previousSeriesId = previousCategories[category];
for (var rowId = 0; rowId < maxRows; rowId++) {
for (var columnId = 1; columnId < maxColumns; columnId++) {
var value = 0;
var previousValue = 0;
if (previousSeries[previousSeriesId] && previousSeries[previousSeriesId][rowId]) {
previousValue = previousSeries[previousSeriesId][rowId][columnId];
if (isNaN(previousValue)) {
previousValue = 0;
}
}
if (!series[seriesId]) {
series[seriesId] = [];
}
if (!series[seriesId][rowId]) {
series[seriesId][rowId] = [];
}
value = series[seriesId][rowId][columnId];
if (isNaN(value)) {
value = 0; // <- bug: this value is never used!
}
if (!seriesData.previous[seriesId]) {
seriesData.previous[seriesId] = [];
}
if (!seriesData.previous[seriesId][rowId]) {
seriesData.previous[seriesId][rowId] = [];
}
seriesData.previous[seriesId][rowId][columnId] = previousValue;
}
}
}
}

A few things make the code harder to read and maintain:

  1. Unclear intention – no explanation of why this is done – why do we need all the current and previous series have the same number of elements and why we use an empty array for the elements that we add? Why do we replace a not-a-number (and what is it anyway, a missing data element?) with 0 and not something else?
  2. Unclear preconditions – no expression of the assumptions the code operates under, i.e. within which limits it is valid w.r.t. inputs and w.r.t. the use of the fake data added to the series
  3. Unclear inputs – lack of information about the structure of the inputs and their valid variations, which makes it hard to understand the code
  4. Duplication – several times we check for an array element and add it if missing; this could be generalized and the code thus shortened and simplified
  5. Lying comments – the initial comment at line 1 reads “add the delta to the data” – but that is not true, we are not adding the difference (e.g. +10%), we are adding the whole previous value (seriesData.previous[..] = previousValue); that’s one of disadvantges of comments – they tend to get out of sync, if they ever were correct

These are not some theoretical shortcomings. Due to them, I wasted over a day debugging our failing graphs and I still struggle to understand why the code is as is. I assume that it once worked (only rarely do people purposefully write broken code) but I can only guess at for which data and which graph types it actually worked correctly. A better structure and good comments would help me understand it faster/better and a bit of defensive programming would likely help me discover the cause hours earlier.

Here is an improved version (still far from perfect; it would be best to throw all of it away and write it properly – but that rarely works; only evolution in small steps is safe enough):


// Set seriesData.previous to hold data series of the previous period
// so that we may compute deltas (e.g. +10%).
// We also ensure that all the series, previous and current, have the same
// number of data points, adding fake ones if necessary, so that the delta-computation
// code <tbd:add reference> doesn't blow up. There may be some data missing
// at the end of a period, for example because months have different numbers of days.
// The fake elements and also any other undefined values will default to [0,0], which
// should be OK with all (line,area,..) graphs and tables.
// E.x.: seriesData.series[0].data = [ [1,"12%"],[2,"22%"],..,[31,"18%"] ],
// the previous month had only 30 days so we add a fake element for day 31;
// See the class Series for more information.
augmentSeriesWithPreviousValues:function (seriesData, previousSeriesData) {
var series = seriesData.series;
seriesData.previous = [];
var previousSeries = previousSeriesData.series;
// checks, computing maxRows, maxColumns, previousCategories
// that maps category name to series index (id)
var matcher = new SeriesMatcher(seriesData, previousCategories); // a helper
// The current and previous series must have data for the same unit (e.g. day 15 if period=month),
// only at the end of the period there may be missing entries (shorter month etc.)
assertMatchingDaysInCurrentAndPrevious(series, previousSeries);
for (var seriesId = 0; seriesId < series.length; seriesId++) {
var previousSeriesId = matcher.previousSeriesIdFor(seriesId)
for (var rowId = 0; rowId < maxRows; rowId++) {
for (var columnId = 1; columnId < maxColumns; columnId++) {
createDataPointIfMissing(series, seriesId, rowId);
replaceNaNWithDefaultOf(0, series, seriesId, rowId, columnId);
createDataPointIfMissing(seriesData.previous, seriesId, rowId);
copyPreviousValueIfDefined(seriesData.previous[seriesId], previousSeries[previousSeriesId], rowId, columnId);
replaceNaNWithDefaultOf(0, seriesData.previous, seriesId, rowId, columnId);
}
}
}
}
// Make sure the 3D array series has the element [seriesId][rowId];
// it's assumed that [seriesId-1][rowId-1] exist, unless 0.
function createDataPointIfMissing(series, seriesId, rowId) {
var array = series;
[seriesId, rowId].forEach(function(idx){
if (!array[idx]) {
array[idx] = [];
}
array = array[idx];
});
return series;
}
// If x or y coordinate of the data point series[seriesId][rowId] is undefined,
// set it to the given default value
function replaceNaNWithDefaultOf(default, series, seriesId, rowId, columnId) {
var value = series[seriesId][rowId][columnId];
if (isNaN(value)) {
series[seriesId][rowId][columnId] = default;
}
}
// Copy the given data point's coordinate from source (if defined) to target
function copyPreviousValueIfDefined(target, source, rowId, columnId) {
if (source && source[rowId] && source[rowId][columnId]) {
target[rowId][columnId] = source[rowId][columnId];
}
}
... // implementation of other helper functions, such as assertNoMissing, not shown

Improvements:

  1. The documentation now explains what and why we do (we may only hope that it is general enough not to get out of sync with the code anytime soon)
  2. We now explicitely check our assumptions about the data (assertMatchingDaysInCurrentAndPrevious) and will fail quickly if they do not hold
  3. Unimportant details have been moved to helper functions with helpful names so that the function’s body is shorter and much easier to read and understand
  4. Duplication has been removed (createDataPointIfMissing, replaceNaNWithDefaultOf)
  5. I’ve raised the level of abstraction in the for loop, stating clearly the intent (createDataPointIfMissing, replaceNaNWithDefaultOf, copyPreviousValueIfDefined), unobscured by the implementation details
  6. Bonus: As a side-effect, I have fixed a bug where we did not replace NaN in the current series, i.e. when the current series had less elements (e.g. days) than in the previous period, we added an [] which then later lead to erros; we should have likely added [0, 0]

Summary

  1. Write your code so that the intent and implementation are separated and the intent is clear (preferably using functions with meaningful names, since comments are hard to maintain)
  2. Express your expectations about the inputs, either in a comment or, preferably, by actually checking them (-> defensive programming)
  3. Create functions on different levels of abstraction, from high-level ones stating the intent and expressing the overall flow of logic to low-level implementation details ones (and do not mix functions of different level in the same code block)
  4. Test your code; tests help you make sure your code works as intended and can be a great documentation of how to use it, what are valid and invalid inputs

If you do this, the maintainer of your code will love you. If you won’t, s/he will waste hours/days trying to understand what is an important part of the code and what is only an implementation detail that can be changed and trying to extract meaning and intent out of the spaghetti monster.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s