It is not always clear which code is better or worse as it might depend on the needs and the team in question. Let’s have a look at two different implementations of a database-backed Set, one that is straightforward and easy to understand and another one that has more structure and less duplication at the expense of understandability. Which one would you choose?
Tip: When designing a readable solution, start by writing implementations of the public methods in terms of hypothetical lower-level methods that you would like to have, without being concerned about their implementation details. Only then try to find out if there is a way to implement these lower-abstraction-level methods that would fit your needs. Thus your implementation will be driven by the public interface and by “what” the class does rather then “how,” resulting in better abstractions and readability. (Kent Beck’s & others’ “thinking from outside in.”)
General solutions are easier to reason about than specific ones.
The goal of the PersistentDatafileSet is to keep information about processed data files in a database so that none will be processed repeatedly with respect to a particular target data storage, identified by a table name and storage type. We will have a dedicated instance of PersistentDatafileSet for each storage. The data files themselves are identified by a filename and timestamp.
Side note: I really like the idea of using a Set backed by a database table for this purpose. The Set contract is familiar to all Java developers and the fact that it is backed by a database is a well-encapsulated “implementation detail.” Reusing known abstractions is a good practice, decreasing the cognitive load of our programs.
1. A straightforward though repetitive implementation
- Straightforward, explicit, easy to understand
- Connection and statement creation is repeated four times together with error handling, in each of the Set methods. The knowledge of the structure of the underlying table is explicitly and implicitly required at multiple places
- The class has too many responsibilities – implementing Set, creating SQL strings, managing DB connections, performing DB operations
It could be argued that the repetition and mixing of concerns are on such a small scale that they are acceptable and are outweighted by the easiness of understanding.
2. More involved implementation with separated concerns and minimized duplication
- Simplification/Abstraction All the public (Set) methods are now one- or two-liners (aside of iterator) and it’s clear how they are implemented.
- Concern separation, de-duplication The details of statement creation and execution have been moved to a generic StatementExecutor independent on the “business logic” and most of the repetition has been removed, the remaining duplication is at least isolated and co-located
- More context in error messages (SQL, parameters) simplifies troubleshooting
- A positive side-effect is that the code could be made unit-testable by hiding StatementExecutor behind an interface and using a fake implementation during tests, though the benefit of a non-integration test for a class highly dependent on a database is questionable.
- Considerably harder to read and understand (?) – The creation of ResultSet processing code looks complex and is unnecessary verbose due to Java’s lack of closures (lambdas); we have three classes instead of one
- The knowledge about the target table’s structure is still repeated at several places, mainly the SQL strings and the insert/select methods are implicitly coupled to the SQL strings defined elsewhere by the need to pass the right parameters in the right order (=> bad locality of change)
- There is still some duplication between executeUpdateStatementOrFail and executeSelectStatement; the cost of its removal (in readability and code complexity) would be higher than the benefit (though I could have extracted & reused the setting of statement parameters)
- The idea of a ResultExtractor isn’t mine, it is a copy of SpringJDBC‘s ResultSetExtractor. Using SpringJDBC would actually really simplify the code and if the choice was mine, I would nearly always prefer SpringJDBC over plain JDBC (it can be well used on its own, without anything else of Spring)
- You might have noticed that I have changed the detection of duplicates in add to rely on the database’s integration constraints and SQLIntegrityConstraintViolationException rather then performing a costly select prior to an insert (though it might be regarded as less straitghforward than an explicit check and if the performance doesn’t matter then it might be prefered)
- I have also switched to PreparedStatement since it made the implementation simpler and also protects again (even inadvertent) SQL injection
- I would normally comment the classes and methods but have left that out for the sake of space; do not take that as a good example!
We have seen two implementations: The former one has straightforward, easy to understand code but is repetitive and has multiple mixed responsibilities. The latter one has factored out low-level database access methods (execute*Statement, createConnection) and implements the high-level Set methods in terms of these primitives, while also mostly removing the repetitive connection creation and exception management – at the expense of understandability due to the use of higher-order functions and Java’s cumbersome syntax for that.
I have discussed the code with a couple of people and the opinions differ, as you might have guessed. Some, especially those with background in Smalltalk, prefer the more structured code. Some would only agree to the introduction of the abstraction if it (the StatementExecutor) actually was used at least 2 or 3 times somewhere, since abstractions make it harder to read – you need to invest time to learn them. People don’t like to navigate too much, going through multiple levels, to find out what the code really does (while someone prefers details abstracted away). It is a lot a matter of personal experiences (have you been bitten more by duplication or by debugging onion-like code?), expectations (Smalltalkers are used to many small, focused objects and find it easier to work with them thanks to the widespread conventions while PHPers are driven crazy by that approach), and trust (do you trust the author that StatementExecutor.executeUpdateStatement(sql, params…) does what its signature indicates or do you need to drill into it to find it out for yourself?).
The main principles guiding the refactoring were Don’t Repeat Yourself (DRY) and Single Responsibility Principle. Other concerns such as performance were not considered (we might want to use connection pooling, caching etc., based on our actual performance needs and tests).