Straightforward vs. Structured, Non-repetitive Code: Which Would You Choose? (DB-Backed Set)

It is not always clear which code is better or worse as it might depend on the needs and the team in question. Let’s have a look at two different implementations of a database-backed Set, one that is straightforward and easy to understand and another one that has more structure and less duplication at the expense of understandability. Which one would you choose?

Tip: When designing a readable solution, start by writing implementations of the public methods in terms of hypothetical lower-level methods that you would like to have, without being concerned about their implementation details. Only then try to find out if there is a way to implement these lower-abstraction-level methods that would fit your needs. Thus your implementation will be driven by the public interface and by “what” the class does rather then “how,” resulting in better abstractions and readability. (Kent Beck’s & others’ “thinking from outside in.”)

General solutions are easier to reason about than specific ones.

Background

The goal of the PersistentDatafileSet is to keep information about processed data files in a database so that none will be processed repeatedly with respect to a particular target data storage, identified by a table name and storage type. We will have a dedicated instance of PersistentDatafileSet for each storage. The data files themselves are identified by a filename and timestamp.

Side note: I really like the idea of using a Set backed by a database table for this purpose. The Set contract is familiar to all Java developers and the fact that it is backed by a database is a well-encapsulated “implementation detail.” Reusing known abstractions is a good practice, decreasing the cognitive load of our programs.

1. A straightforward though repetitive implementation

	/** A set backed by a database table */
	public class PersistentDatafileSetStraightforward extends AbstractSet<Datafile> {

	public enum StorageType {Hive, JDBC}

	public static final String JDBC_URL = "jdbc.url";
	public static final String JDBC_USERNAME = "jdbc.username";
	public static final String JDBC_PASSWORD = "jdbc.password";

	private final Configuration conf;
	private final String tableName;
	private final StorageType storageType;

	public PersistentDatafileSetStraightforward(Configuration conf,
	String tableName,
	StorageType storageType) {
	this.tableName = tableName;
	this.storageType = storageType;
	this.conf = conf;
	}

	@Override
	public boolean add(Datafile datafile) {
	String filename = datafile.getFilename();
	long published = datafile.getTimestamp();

	try (Connection connection = createConnection();
	Statement statement = connection.createStatement()) {

	connection.setAutoCommit(false);

	if (countRows(statement, filename, published) > 0) {
	return false;
	}

	String insertStatement = makeInsertStatement(filename, published);
	statement.execute(insertStatement);
	connection.commit();
	return true;
	} catch (SQLException \| IOException e) {
	throw new RuntimeException(e);
	}
	}

	@Override
	public boolean contains(Object o) {
	Datafile datafile = (Datafile) o;

	String filename = datafile.getFilename();
	long timestamp = datafile.getTimestamp();

	try (Statement statement = createConnection().createStatement()) {
	return countRows(statement, filename, timestamp) > 0;
	} catch (SQLException\|IOException e) {
	throw new RuntimeException("Error reading database", e);
	}
	}

	@Override
	public int size() {
	try (Statement statement = createConnection().createStatement()) {
	return countRows(statement);
	} catch (SQLException\|IOException e) {
	throw new RuntimeException("Error reading database", e);
	}
	}

	@Override
	public Iterator<Datafile> iterator() {
	try (Statement statement = createConnection().createStatement()) {

	String getAllStatement = makeGetAllStatement();
	ResultSet results = statement.executeQuery(getAllStatement);
	List<Datafile> resultList = new ArrayList<>();

	while (results.next()) {
	resultList.add(new Datafile(results.getString(1), results.getLong(2)));
	}
	return resultList.iterator();

	} catch (SQLException\|IOException e) {
	throw new RuntimeException("Error reading database", e);
	}
	}

	// ———————————————————————- private

	private Connection createConnection() throws IOException {
	try {

	return DriverManager.getConnection(conf.get(JDBC_URL),
	conf.get(JDBC_USERNAME, ""),
	conf.get(JDBC_PASSWORD, ""));
	} catch (Exception e) {
	throw new IOException("Exception creating connection to "
	+ conf.get(JDBC_URL), e);
	}
	}

	// Counts the number of rows for a set.
	String makeCountStatement() {
	return "select count(*) from datafiles where "
	+ "table_name = '" + tableName + "' "
	+ "and storage_type = '" + storageType + "';";
	}

	// Counts the number of rows for a set and a given datafile.
	String makeCountStatement(String filename, long timestamp) {
	return
	"select count(*) from datafiles where "
	+ "and filename = '" + filename + "' "
	+ "and timestamp = '" + timestamp + "' "
	+ "and table_name = '" + tableName + "' "
	+ "and storage_type = '" + storageType + "';";
	}

	private int countRows(Statement statement) throws SQLException {
	String countStatement = makeCountStatement();
	ResultSet previousValue = statement.executeQuery(countStatement);
	previousValue.next();

	return previousValue.getInt(1);
	}

	private int countRows(Statement statement,
	String filename,
	long timestamp) throws SQLException {

	String countStatement = makeCountStatement(filename, timestamp);
	ResultSet previousValue = statement.executeQuery(countStatement);
	previousValue.next();

	return previousValue.getInt(1);
	}

	String makeInsertStatement(String filename, long timestamp) {
	return
	"insert into datafiles "
	+ "(filename, timestamp, table_name, storage_type) "
	+ "values ("
	+ "'" + filename + "', "
	+ timestamp + ", "
	+ "'" + tableName + "', "
	+ "'" + storageType + "');";
	}

	// Get all rows for a given set.
	String makeGetAllStatement() {
	return "select filename, timestamp from datafiles where "
	+ "table_name = '" + tableName + "' "
	+ "and storage_type = '" + storageType + "';";

	}
	}

view raw

PersistentDatafileSetStraightforward.java

hosted with ❤ by GitHub

Pros:

Straightforward, explicit, easy to understand

Cons:

Connection and statement creation is repeated four times together with error handling, in each of the Set methods. The knowledge of the structure of the underlying table is explicitly and implicitly required at multiple places
The class has too many responsibilities – implementing Set, creating SQL strings, managing DB connections, performing DB operations

It could be argued that the repetition and mixing of concerns are on such a small scale that they are acceptable and are outweighted by the easiness of understanding.

2. More involved implementation with separated concerns and minimized duplication

	public class PersistentDatafileSetStructured extends AbstractSet<Datafile> {

	public static enum StorageType {Hive, JDBC};

	private final StatementExecutor db;

	private final String tableName;
	private final StorageType storageType;

	private final String selectAll;
	private final String countOneDatafileSql;
	private final String countAllSql;
	private final String insertSql;

	public PersistentDatafileSetStructured(Configuration conf, String tableName, StorageType storageType) {
	this.db = new StatementExecutor(conf);
	this.tableName = tableName;
	this.storageType = storageType;

	countAllSql = "select count(*) from datafiles where "
	+ "table_name = '" + tableName + "' "
	+ "and storage_type = '" + storageType + "' ";

	countOneDatafileSql = countAllSql
	+ "and filename = ? "
	+ "and timestamp = ? ";

	insertSql = "insert into datafiles "
	+ "(filename, timestamp, table_name, storage_type) "
	+ "values (?, ?, "
	+ "'" + tableName + "', "
	+ "'" + storageType + "')";

	selectAll = "select filename, timestamp from datafiles where "
	+ "table_name = '" + tableName + "' "
	+ "and storage_type = '" + storageType + "'";

	}

	@Override
	public boolean add(Datafile e) {
	return db.executeInsertStatement(insertSql, e.getFilename(), e.getTimestamp());
	}

	@Override
	public boolean contains(Object o) {
	Datafile e = (Datafile) o;
	return executeSelectCount(countOneDatafileSql, e.getFilename(), e.getTimestamp()) > 0;
	}

	@Override
	public int size() {
	return executeSelectCount(countAllSql);
	}

	@Override
	public Iterator<Datafile> iterator() {

	// Extractor ResultSet => Iterator over Datafiles
	ResultExtractor<Iterator<Datafile>> datafileListExtractor = new ResultExtractor<Iterator<Datafile>>() {

	public Iterator<Datafile> extract(ResultSet results) throws SQLException {

	List<Datafile> resultList = new ArrayList<>();
	while (results.next()) {
	resultList.add(new Datafile(results.getString(1), results.getLong(2)));
	}
	return resultList.iterator();
	}
	};

	return db.executeSelectStatement(datafileListExtractor, selectAll);
	}

	private int executeSelectCount(String sql, Object… params) {

	ResultExtractor<Integer> oneIntExtractor = new ResultExtractor<Integer>() {

	public Integer extract(ResultSet rs) throws SQLException {
	rs.next(); // count(*) always returns 1 row
	return rs.getInt(1);
	}
	};

	return db.executeSelectStatement(oneIntExtractor, sql, params);
	}
	}

view raw

PersistentDatafileSetStructured.java

hosted with ❤ by GitHub

	/** For passing argument-to-result transformations around. */
	interface ResultExtractor<T> {
	T extract(ResultSet rs) throws SQLException;
	}

view raw

ResultExtractor.java

hosted with ❤ by GitHub

	/** A generic select/insert/update/create statement executor, independent of business logic. */
	class StatementExecutor {

	public static final String JDBC_URL = "jdbc.url";
	public static final String JDBC_USERNAME = "jdbc.username";
	public static final String JDBC_PASSWORD = "jdbc.password";

	private final Configuration conf;

	public StatementExecutor(Configuration conf) {
	this.conf = conf;
	}

	public boolean executeInsertStatement(String sql, Object… params) {
	// Inserts ignore PK violations and return false when they happen
	return executeUpdateStatementOrFail(true, sql, params);
	}

	public boolean executeUpdateStatement(String sql, Object… params) {
	// Updates should not ignore PK violations
	return executeUpdateStatementOrFail(false, sql, params);
	}

	private boolean executeUpdateStatementOrFail(boolean ignorePkViolation, String sql, Object… params) {

	try (Connection connection = createConnection();
	PreparedStatement statement = connection.prepareStatement(sql)) {

	for (int i = 0; i < params.length; i++) {
	statement.setObject((i+1), params[i]);
	}

	return statement.executeUpdate() == 1;

	} catch (SQLException \| IOException e) {
	// For inserts it is enough to return false when not inserted due to integrity constraints
	if (e instanceof SQLIntegrityConstraintViolationException && ignorePkViolation) {
	return false;
	}
	throw new RuntimeException("Error executing '" + sql + "' with " + Arrays.toString(params), e);
	}
	}

	public <T> T executeSelectStatement(ResultExtractor<T> extractor, String sql, Object… params) {
	try (Connection connection = createConnection();
	PreparedStatement statement = connection.prepareStatement(sql)) {

	for (int i = 0; i < params.length; i++) {
	statement.setObject((i+1), params[i]);
	}

	return extractor.extract(statement.executeQuery());

	} catch (SQLException \| IOException e) {
	throw new RuntimeException("Error executing " + sql + " with " + Arrays.toString(params), e);
	}
	}

	private Connection createConnection() throws IOException {
	try {
	return DriverManager.getConnection(conf.get(JDBC_URL),
	conf.get(JDBC_USERNAME, ""),
	conf.get(JDBC_PASSWORD, ""));
	} catch (Exception e) {
	throw new IOException("Exception creating connection to "
	+ conf.get(JDBC_URL), e);
	}
	}

	}

view raw

StatementExecutor.java

hosted with ❤ by GitHub

Pros:

Simplification/Abstraction All the public (Set) methods are now one- or two-liners (aside of iterator) and it’s clear how they are implemented.
Concern separation, de-duplication The details of statement creation and execution have been moved to a generic StatementExecutor independent on the “business logic” and most of the repetition has been removed, the remaining duplication is at least isolated and co-located
More context in error messages (SQL, parameters) simplifies troubleshooting
A positive side-effect is that the code could be made unit-testable by hiding StatementExecutor behind an interface and using a fake implementation during tests, though the benefit of a non-integration test for a class highly dependent on a database is questionable.

Cons:

Considerably harder to read and understand (?) – The creation of ResultSet processing code looks complex and is unnecessary verbose due to Java’s lack of closures (lambdas); we have three classes instead of one
The knowledge about the target table’s structure is still repeated at several places, mainly the SQL strings and the insert/select methods are implicitly coupled to the SQL strings defined elsewhere by the need to pass the right parameters in the right order (=> bad locality of change)
There is still some duplication between executeUpdateStatementOrFail and executeSelectStatement; the cost of its removal (in readability and code complexity) would be higher than the benefit (though I could have extracted & reused the setting of statement parameters)

Other comments:

The idea of a ResultExtractor isn’t mine, it is a copy of SpringJDBC‘s ResultSetExtractor. Using SpringJDBC would actually really simplify the code and if the choice was mine, I would nearly always prefer SpringJDBC over plain JDBC (it can be well used on its own, without anything else of Spring)
You might have noticed that I have changed the detection of duplicates in add to rely on the database’s integration constraints and SQLIntegrityConstraintViolationException rather then performing a costly select prior to an insert (though it might be regarded as less straitghforward than an explicit check and if the performance doesn’t matter then it might be prefered)
I have also switched to PreparedStatement since it made the implementation simpler and also protects again (even inadvertent) SQL injection
I would normally comment the classes and methods but have left that out for the sake of space; do not take that as a good example!

Summary

We have seen two implementations: The former one has straightforward, easy to understand code but is repetitive and has multiple mixed responsibilities. The latter one has factored out low-level database access methods (execute*Statement, createConnection) and implements the high-level Set methods in terms of these primitives, while also mostly removing the repetitive connection creation and exception management – at the expense of understandability due to the use of higher-order functions and Java’s cumbersome syntax for that.

I have discussed the code with a couple of people and the opinions differ, as you might have guessed. Some, especially those with background in Smalltalk, prefer the more structured code. Some would only agree to the introduction of the abstraction if it (the StatementExecutor) actually was used at least 2 or 3 times somewhere, since abstractions make it harder to read – you need to invest time to learn them. People don’t like to navigate too much, going through multiple levels, to find out what the code really does (while someone prefers details abstracted away). It is a lot a matter of personal experiences (have you been bitten more by duplication or by debugging onion-like code?), expectations (Smalltalkers are used to many small, focused objects and find it easier to work with them thanks to the widespread conventions while PHPers are driven crazy by that approach), and trust (do you trust the author that StatementExecutor.executeUpdateStatement(sql, params…) does what its signature indicates or do you need to drill into it to find it out for yourself?).

The main principles guiding the refactoring were Don’t Repeat Yourself (DRY) and Single Responsibility Principle. Other concerns such as performance were not considered (we might want to use connection pooling, caching etc., based on our actual performance needs and tests).

Quiz time!

4 thoughts on “Straightforward vs. Structured, Non-repetitive Code: Which Would You Choose? (DB-Backed Set)”

Michael May 30, 2013 at 1:09 am

Interesting comparison. I would much rather maintain the second implementation, as it is much easier to know where to make a change if you need to. Additionally, if there is some in for example the connection logic, you would need to change it in several places.

Only other thing, the structured code example is still vulnerable to sql injection, through the table name and storage type parameters, which may or may not be an issue depending upon where these are coming from.

Reply ↓
1. Jakub Holý Post authorJune 1, 2013 at 4:15 pm
  
  Thank you, Michael! And you are right regarding the sql injection, but in this case it is not a problem [1].
  [1] http://www.funny.com/cgi-bin/WebObjects/Funny.woa/wa/funny?fn=CBW8Q&Funny_Jokes=Famous_Last_Words
  
  Reply ↓
Anonymous Coward May 30, 2013 at 2:18 pm

As long as you don’t get to reuse the result extractor and the statement executor, you have just added cost with no benefit. It may look better, but it actually is just gold plating.

From another point of view, refactoring the first solution into the second one when there’s no obvious need for it is a lot like premature optimization – which, according to Djikstra, is the root of all evil.

As programmers, we should learn to look at cost. Switching from the first solution to the second, we have not only invested additional effort without any visible gain, we have also forced future maintainers to do the same thing – maintain two classes instead of one, with one of them being most likely quite expensive to maintain, due to its generality (the statement executor).

There are definitely some things to improve in the first version. I’d extract the SQL templates into private static final members, and not inline parameters – you open your code up for SQL injection that way – and I’d gather the knowledge about running an SQL statement in a single place – this would somewhat shorten the code, which means less code for bugs to hide in. Both of these refactorings would provide immediate gains. But I would definitely not add another interface and a class to the design, as long as there is no other place where to use them – these would just provide more places for bugs to hide.

Reply ↓
1. Jakub Holý Post authorJune 1, 2013 at 4:32 pm
  
  Dear Coward, thank you for your comment. I am very happy to have readers with varied opinions 🙂 I absolutely agree that we should always consider costs of what we do (though we might all end up with a different result). And I’d love to see your solution of ” I’d gather the knowledge about running an SQL statement in a single place”.
  I have already experienced that different people have different opinion on this topic so I do not expect to persuade you but I would still like to address some of the points.
  
  – “… added cost with no benefit” – less duplication is a clear benefit for me (though the gain/cost calculation may differ for you), code focused on one thing is easier to read for me than one doing everything (though other people prefer the opposit); YMMV but for me this isn’t a “no benefit” change
  – “.. refactoring … when there’s no obvious need for it” – well, it was mostly a learning exercise for me; I ddidn’t like the code and wanted to understand why and what would be a code that I would like more so for me the refactoring certainly had a value
  – “maintain two classes instead of one, with one of them being most likely quite expensive to maintain, due to its generality” – it is disputable how much more difficult it is to maintain two classes focused on one thing each than one do-it-all class, as you can see, f.ex. Michael has a different estimation of the cost than you;
  
  Some friends of mine have argued in accord with you that S.E. is unnecessarily generic. But there is another way to look at it. If you look at a PreparedStatement, what does it do? It takes a SQL string and a number of parameters (depending on the string) in a particular order and turns them into a result. This is exactly what S.E. does, its interface reflects this directly.
  Making S.E. more specific (e.g. taking a “filename” and “timestamp” parameters) would hide the fact that is only wraps a Prep.St., would push some logic from the set implementation into it, making the two more coupled. *For me* it is easier to understand a generic but otherwise simple class by just looking at its interface than if it contains some business-specific logic.
  But that is just me. YMMV.
  Thank you!
  
  Reply ↓

Wonders of Code

A study of good and bad code – to become better programmers

Straightforward vs. Structured, Non-repetitive Code: Which Would You Choose? (DB-Backed Set)

Background

1. A straightforward though repetitive implementation

2. More involved implementation with separated concerns and minimized duplication

Summary

Quiz time!

4 thoughts on “Straightforward vs. Structured, Non-repetitive Code: Which Would You Choose? (DB-Backed Set)”

Leave a comment Cancel reply

Background

1. A straightforward though repetitive implementation

2. More involved implementation with separated concerns and minimized duplication

Summary

Quiz time!

Share this:

Related

4 thoughts on “Straightforward vs. Structured, Non-repetitive Code: Which Would You Choose? (DB-Backed Set)”

Leave a comment Cancel reply