Duplication-First Programming

Nick Gard
3 min readFeb 7, 2018

--

Begin with duplication, abstract later

This is not a new idea. It is probably better said by Sandi or Matt, but I want to coin a new phrase to encompass the idea: Duplication-First Programming. The basic idea is that until there is enough code to detect a pattern, duplication is unavoidable, even necessary. To go even further than that, duplication remains after abstraction. What is left behind after “de-duplicating” some code is a reference to the abstraction, be it a function call or an object name. The goal in abstracting repeated logic (a pattern) is to reduce the duplication, not eliminate it. The code-pattern is reduced cognitively and spatially by giving it a compact name. If the abstraction is neither smaller nor more readable than the code it replaces, then it is probably not necessary.

This tenet is YAGNI taken to the extreme. Don’t create a variable to use it once or twice, instead use the value directly in those locations. Don’t create a function that only calls another function with some hard-coded arguments. Write all of your code as it is needed first, then determine if there is an actual pattern to abstract away.

This is not to say that you should avoid all abstractions. Most likely, the tools that you are using, including the language you are coding in, are enough of an abstraction. Use them. Follow good coding practices and patterns. However, at some point, you will need to write bespoke code that is not repeated and cannot be abstracted. This uniqueness is what makes your program yours. It is unavoidable.

When should you abstract some code into a new module?

  • If the code has been repeated verbatim (or with minor, cohesive changes) more than 3 times.
  • The abstraction is shorter to write than what is abstracted.
  • The abstraction is clearer to read than what is abstracted.
  • The abstracted code will change significantly less (or more) often than the surrounding code from which it was removed (e.g. store a regular expression in a variable, even if it is only referred to once because it will likely change more often than the code consuming it).
  • The abstracted code isn’t just an implementation of an existing abstraction (e.g. don’t create function warn(msg){ log('warn', msg); }, rather inline that log statement).
  • The idea of the abstraction is easily named. This indicates that the code is semantically connected and not merely a collection of repeated lines of code.
  • The abstraction wouldn’t require an extensive list of parameters or a large configuration object/file. That is, there aren’t instances of the code that are just different enough that they can’t be easily replaced by an abstraction. (In cases like that, reduce the scope of the abstraction to cover all the cases if possible, or abandon the abstraction if not possible.)

Caveat

Modularization is not always about reusability. It is often (probably more often) about readability. It is easier to read ten 100-line files instead of one 1,000-line file, even if each file is used only once. It is also sometimes easier to read through complex logic with variable names than raw values.

P.S.

As with all programming principals:

They’re more what you call… guidelines

--

--