Highly repetitive code is generally a bad thing, and there are design patterns that can help minimize this. However, sometimes it's simply inevitable due to the constraints of the language itself. Take the following example from java.util.Arrays
:
/**
* Assigns the specified long value to each element of the specified
* range of the specified array of longs. The range to be filled
* extends from index <tt>fromIndex</tt>, inclusive, to index
* <tt>toIndex</tt>, exclusive. (If <tt>fromIndex==toIndex</tt>, the
* range to be filled is empty.)
*
* @param a the array to be filled
* @param fromIndex the index of the first element (inclusive) to be
* filled with the specified value
* @param toIndex the index of the last element (exclusive) to be
* filled with the specified value
* @param val the value to be stored in all elements of the array
* @throws IllegalArgumentException if <tt>fromIndex > toIndex</tt>
* @throws ArrayIndexOutOfBoundsException if <tt>fromIndex < 0</tt> or
* <tt>toIndex > a.length</tt>
*/
public static void fill(long[] a, int fromIndex, int toIndex, long val) {
rangeCheck(a.length, fromIndex, toIndex);
for (int i=fromIndex; i<toIndex; i++)
a[i] = val;
}
The above snippet appears in the source code 8 times, with very little variation in the documentation/method signature but exactly the same method body, one for each of the root array types int[]
, short[]
, char[]
, byte[]
, boolean[]
, double[]
, float[]
, and Object[]
.
I believe that unless one resorts to reflection (which is an entirely different subject in itself), this repetition is inevitable. I understand that as a utility class, such high concentration of repetitive Java code is highly atypical, but even with the best practice, repetition does happen! Refactoring doesn't always work because it's not always possible (the obvious case is when the repetition is in the documentation).
Obviously maintaining this source code is a nightmare. A slight typo in the documentation, or a minor bug in the implementation, is multiplied by however many repetitions was made. In fact, the best example happens to involve this exact class:
Google Research Blog - Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken (by Joshua Bloch, Software Engineer)
The bug is a surprisingly subtle one, occurring in what many thought to be just a simple and straightforward algorithm.
// int mid =(low + high) / 2; // the bug
int mid = (low + high) >>> 1; // the fix
The above line appears 11 times in the source code!
So my questions are:
A comment requested another example, so I pulled this one from Google Collections: com.google.common.base.Predicates lines 276-310 (AndPredicate
) vs lines 312-346 (OrPredicate
).
The source for these two classes are identical, except for:
AndPredicate
vs OrPredicate
(each appears 5 times in its class)"And("
vs Or("
(in the respective toString()
methods)#and
vs #or
(in the @see
Javadoc comments)true
vs false
(in apply
; !
can be rewritten out of the expression)-1 /* all bits on */
vs 0 /* all bits off */
in hashCode()
&=
vs |=
in hashCode()
For people that absolutely need performance, boxing and unboxing and generified collections and whatnot are big no-no's.
The same problem happens in performance computing where you need the same complex to work both for float and double (say some of the method shown in Goldberd's "What every computer scientist should know about floating-point numbers" paper).
There's a reason why Trove's TIntIntHashMap
runs circles around Java's HashMap<Integer,Integer>
when working with a similar amount of data.
Now how are Trove collection's source code written?
By using source code instrumentation of course :)
There are several Java libraries for higher performance (much higher than the default Java ones) that use code generators to create the repeated source code.
We all know that "source code instrumentation" is evil and that code generation is crap, but still that's how people who really know what they're doing (i.e. the kind of people that write stuff like Trove) do it :)
For what it is worth we generate source code that contains big warnings like:
/*
* This .java source file has been auto-generated from the template xxxxx
*
* DO NOT MODIFY THIS FILE FOR IT SHALL GET OVERWRITTEN
*
*/
If you absolutely must duplicate code, follow the great examples you've given and group all of that code in one place where it's easy to find and fix when you have to make a change. Document the duplication and, more importantly, the reason for the duplication so that everyone who comes after you is aware of both.
From Wikipedia Don't Repeat Yourself (DRY) or Duplication is Evil (DIE)
In some contexts, the effort required to enforce the DRY philosophy may be greater than the effort to maintain separate copies of the data. In some other contexts, duplicated information is immutable or kept under a control tight enough to make DRY not required.
There is probably no answer or technique to prevent problems like that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With