Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Managing highly repetitive code and documentation in Java

Highly repetitive code is generally a bad thing, and there are design patterns that can help minimize this. However, sometimes it's simply inevitable due to the constraints of the language itself. Take the following example from java.util.Arrays:

/**
 * Assigns the specified long value to each element of the specified
 * range of the specified array of longs.  The range to be filled
 * extends from index <tt>fromIndex</tt>, inclusive, to index
 * <tt>toIndex</tt>, exclusive.  (If <tt>fromIndex==toIndex</tt>, the
 * range to be filled is empty.)
 *
 * @param a the array to be filled
 * @param fromIndex the index of the first element (inclusive) to be
 *        filled with the specified value
 * @param toIndex the index of the last element (exclusive) to be
 *        filled with the specified value
 * @param val the value to be stored in all elements of the array
 * @throws IllegalArgumentException if <tt>fromIndex &gt; toIndex</tt>
 * @throws ArrayIndexOutOfBoundsException if <tt>fromIndex &lt; 0</tt> or
 *         <tt>toIndex &gt; a.length</tt>
 */
public static void fill(long[] a, int fromIndex, int toIndex, long val) {
    rangeCheck(a.length, fromIndex, toIndex);
    for (int i=fromIndex; i<toIndex; i++)
        a[i] = val;
}

The above snippet appears in the source code 8 times, with very little variation in the documentation/method signature but exactly the same method body, one for each of the root array types int[], short[], char[], byte[], boolean[], double[], float[], and Object[].

I believe that unless one resorts to reflection (which is an entirely different subject in itself), this repetition is inevitable. I understand that as a utility class, such high concentration of repetitive Java code is highly atypical, but even with the best practice, repetition does happen! Refactoring doesn't always work because it's not always possible (the obvious case is when the repetition is in the documentation).

Obviously maintaining this source code is a nightmare. A slight typo in the documentation, or a minor bug in the implementation, is multiplied by however many repetitions was made. In fact, the best example happens to involve this exact class:

Google Research Blog - Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken (by Joshua Bloch, Software Engineer)

The bug is a surprisingly subtle one, occurring in what many thought to be just a simple and straightforward algorithm.

    // int mid =(low + high) / 2; // the bug
    int mid = (low + high) >>> 1; // the fix

The above line appears 11 times in the source code!

So my questions are:

  • How are these kinds of repetitive Java code/documentation handled in practice? How are they developed, maintained, and tested?
    • Do you start with "the original", and make it as mature as possible, and then copy and paste as necessary and hope you didn't make a mistake?
    • And if you did make a mistake in the original, then just fix it everywhere, unless you're comfortable with deleting the copies and repeating the whole replication process?
    • And you apply this same process for the testing code as well?
  • Would Java benefit from some sort of limited-use source code preprocessing for this kind of thing?
    • Perhaps Sun has their own preprocessor to help write, maintain, document and test these kind of repetitive library code?

A comment requested another example, so I pulled this one from Google Collections: com.google.common.base.Predicates lines 276-310 (AndPredicate) vs lines 312-346 (OrPredicate).

The source for these two classes are identical, except for:

  • AndPredicate vs OrPredicate (each appears 5 times in its class)
  • "And(" vs Or(" (in the respective toString() methods)
  • #and vs #or (in the @see Javadoc comments)
  • true vs false (in apply; ! can be rewritten out of the expression)
  • -1 /* all bits on */ vs 0 /* all bits off */ in hashCode()
  • &= vs |= in hashCode()
like image 236
polygenelubricants Avatar asked Feb 25 '10 19:02

polygenelubricants


3 Answers

For people that absolutely need performance, boxing and unboxing and generified collections and whatnot are big no-no's.

The same problem happens in performance computing where you need the same complex to work both for float and double (say some of the method shown in Goldberd's "What every computer scientist should know about floating-point numbers" paper).

There's a reason why Trove's TIntIntHashMap runs circles around Java's HashMap<Integer,Integer> when working with a similar amount of data.

Now how are Trove collection's source code written?

By using source code instrumentation of course :)

There are several Java libraries for higher performance (much higher than the default Java ones) that use code generators to create the repeated source code.

We all know that "source code instrumentation" is evil and that code generation is crap, but still that's how people who really know what they're doing (i.e. the kind of people that write stuff like Trove) do it :)

For what it is worth we generate source code that contains big warnings like:

/*
 * This .java source file has been auto-generated from the template xxxxx
 * 
 * DO NOT MODIFY THIS FILE FOR IT SHALL GET OVERWRITTEN
 * 
 */
like image 193
SyntaxT3rr0r Avatar answered Nov 08 '22 01:11

SyntaxT3rr0r


If you absolutely must duplicate code, follow the great examples you've given and group all of that code in one place where it's easy to find and fix when you have to make a change. Document the duplication and, more importantly, the reason for the duplication so that everyone who comes after you is aware of both.

like image 41
Bill the Lizard Avatar answered Nov 08 '22 03:11

Bill the Lizard


From Wikipedia Don't Repeat Yourself (DRY) or Duplication is Evil (DIE)

In some contexts, the effort required to enforce the DRY philosophy may be greater than the effort to maintain separate copies of the data. In some other contexts, duplicated information is immutable or kept under a control tight enough to make DRY not required.

There is probably no answer or technique to prevent problems like that.

like image 6
stacker Avatar answered Nov 08 '22 01:11

stacker