What is better, a bunch of mutable boolean fields and methods operating on them, or an explicit expression of the individual states and transitions between them? Lets study an example from a simulation of the progression of a multi-stage infection.
1. Design hidden in primitive mutable fields and methods
The following class, with a number of interrelated mutable (public) fields, which I should have completed with methods for transitions between their states, made me really uneasy (a var is a mutable field, val is immutable):
The fields keep the state of the progress of an infection. They depend on each other – f.ex. when sick, the person must also be infected, while when dead, she must not be immune.
The progression of the infection is: healthy -> infected and infectious with the chance of 40% (but not visibly sick yet) -> on day 6 visibly sick -> on day 14 dies with the chance of 25% -> if not dead, becomes immune on day 16 – not visibly sick but still infectious -> healthy on day 18.
The problem I have with keeping the state in a bunch of fields is that there is no explicit expression of these rules and that it opens for defects such as setting sick to true while forgetting to set also infected. It is also hard to understand. You could learn the rules by studying the methods that alter the fields but it requires a lot of effort, it isn’t easy to distinguish between an incidental implementation detail and intentional design and this code does not prevent mistakes like the one described.
- Easy to design
- The rules of the infection progression are not clear from the code (well, it wouldn’t be even if the methods were actually shown :)), i.e. it communicates poorly the domain concepts and rules
- The values of multiple fields must be synchronized, failing to ensure that leads to defects
- Leads to spaghetti code
2. Explicit and defect-preventing expression of the states and transitions
What I would prefer is to make the rules the central piece of the design rather then basing it on the implementation details of how we keep the state (i.e. a set of booleans to manage). In other words, I want to surface the design hidden in this. I want to have an explicit notion of the states – Healthy, Incubating, Sick, Dead, Immune – with clear transitions between them and I want these states to explicitely carry the secondary information about whether the person is infectious or not and whether this is visible or not.
One way to express this design explicitely is this:
- The rules and stages of the infection are now explicit and first-class members of the code; Domain-Driven Design in practice
- The transitions between the states are clear, explicit, and we cannot get the person into an invalid state (provided we defined the transitions correctly)
- We don’t need anymore to synchronize the state of multiple variables
- The code is likely longer than a bunch of bool fields and methods for transitioning between their states
- It may seem complicated because, instead of one class and few methods, we have suddenly a class hierarchy; but it actually prevents the complexity of the original spaghetti code so, though not “easy” to comprehend, it is “simple” as per R. Hickey
I often encounter code like this, especially in older legacy applications that have been evolved according to changing business needs with primary focus on “features” without updating/refactoring the underlaying design accordingly. Frankly, it is hell. The low-level code working on a number of interdependant fields (in a class that has likely also a number of unrelated fields or fields that are depending on these only in certain use cases) is a heaven for defects to hide in and multiply in. And it is hard to understand since the design – the rules, concepts, and intentions – have not been made explicit (yet). And hard to understand means hard to change.
Therefore it is important to survey your code regularly and surface the design hidden in it, hiding away low-level implementation details and making the key concepts, states, transitions, rules etc. first-class members of the code base so that reading the code feels as communicating and learning rather than as an archeology.
Jerrinot’s Enum and Java based code, included here for easier readability:
Update 2: Short Enum-like version and the full “primitive” version
Jerrinot’s version in Scala:
The “primitive” version with nearly full code, excluding the information what happens when:
Update 3: Other criteria to consider when evaluating the code
My mentor has reminded me than there are other criteria worth considering than a personal sense of beauty and that, when I have a painful experience with something (such as interrelated mutable primitive fields), I tend to go too far in an attempt to avoid it, stepping into another problem without realising it (code verbosity and lack of understandability).
So some of the criteria to consider are
- beauty? <=> maintainability and likelihood of defects
- testability and how quickly/early we can start with testing
The code with explicit states is much, much longer than the one with four booleans. Which to write when, is the important question.
Another factor to consider is succession: which code can you start testing and using soonest? Is one style more conducive to being used while it is incomplete. Let’s say I use the four booleans model and make them public fields so I don’t have to write or invoke getters and setters. Can I start writing my simulation, then notice that there are patterns to how the booleans are getting set and move smoothly to the explicit finite state machine?
I also have to acknowledge that the full primitive code is actually much shorter and simpler than I feared. Still uglier and more open for defects but perhaps not enough to justify the effort put into inventing a model with explicit states.