Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ to Java: searching a collection efficiently

Coming from a mostly C++ background I am now writing some Java in anger. Something I find basic in C++ using the STL seems to be more cumbersome in Java than I think it should be. My conclusion is that there is probably a better Java idiom I don't grok yet. Here is an example using pseudo-code.

I have a collection of things that have a natural ordering relation based on some member variables which happen to be strings.

class Thing
{
   String key1;
   String key2;
}

In C++ I might define an ordering operator<(Thing,Thing) and put these in a std::set. E.g.

///
/// @brief
/// provide a total order for 'Things' using key1 and key2
///
bool operator<(const Thing& a, const Thing& b)
{
  if (a.key1 < b.key1) return true; 
  else if (a.key1 > b.key1) return false; 
  else return a.key2 < b.key2;
} 

I can then find elements in O(log N) time using set::find for the case of having a Thing. Using additional overloads of operator<(). I can search having just key1 or having both key1 and key2 using std::lower_bound or std::equal_range. For example:

struct Ordering
{
   /// A strict weak ordering not a total ordering
   bool operator()(const Thing& A,const std::string& key1) const;
}

const_iterator iter = std::lower_bound(someThings.begin(),
                                       someThings.end(),
                                       key1,
                                       Ordering());

To make this less abstract imagine key1 is name and key2 is version. I can ask do we have any software called Foobar or more specifically do we have Foobar v1.0.

On the face of it, the most direct equivalent of std::set in Java appears to be TreeSet The ordering can be implemented by subclassing the Comparator interface. However for what I'm talking about it looks like multiple Maps would be needed to do this in Java. In C++ would only bother using an associative container like std::map if I wanted to change the value. In a C++ std::set as in a Java TreeSet the value is its own key. However, in C++ I can write comparators to compare "Thing" with "std::string" using key1 or key2 as appropriate and find a specific thing in a std::set of them. It looks to me like you have to use Map to do this in Java. Otherwise (because Comparator has only one type parameter) you end up with a mess like:

public static class Order implements Comparator<Object>
{
  @Override
  @Constant
  public int compare(Object a, Object b)
  {
     String aString;
     String bString;         
     if (a instanceof String)
     {
        aString = (String)a;
     }
     else if (a instanceof Thing)
     {
        aString = ((Field)a).getKey1();
     }
     else
     {
        throw new ClassCastException("String or Field object expected.");
     }
     if (b instanceof String)
     {
        bString = (String)b;
     }
     else if (b instanceof Thing)
     {
        bString = ((Field)b).getKey1();
     }
     else
     {
        throw new ClassCastException("String or Field object expected.");
     }
     return aString.compareTo(bString);
  }
};

However, If you do that you can (in class Thing) write:

Set<Thing> things = new TreeSet<Thing>(new Order());

boolean hasFieldWithKey1(final String key1) 
{
   return this.fields.contains(key1);
}

with a Java Set you can only test for existence but not retrieve the object you are searching for. e.g. you can't do

Field getFieldWithKey1(final String key1) 
{
   return this.fields.floor(key1);
}

because methods like floor() only accept objects of the value type (i.e. Thing)

The obvious solution is to use a Map for each key.

Map<String,Thing> thingsByKey1 = new TreeMap<Thing>(new Order());

Coming from a C++ background this seems unnecessarily bloated. Why should I store the key again when thing already contains it? If I have two keys its even worse. I will need two maps.

Map<String,Thing> thingsByKey1 = new TreeMap<Thing>(new OrderByKey1());
Map<String,Thing> thingsByKey2 = new TreeMap<Thing>(new OrderByKey2());

I am now duplicating not just the key but creating additional unnecessary tree data structures (or HashMaps with better runtime performance) as well. For the ordering implementation as above this is could also be 'just plain wrong' as by itself each key forms only a partial order not a total order on a set of things.

I have seen questions about searching here answered with use of linear search which is almost always the worst choice. E.g.

Finding all objects that have a given property inside a collection

I note that there is a version of BinarySearch that accepts a Comparator object as an argument but returns it the index of the element rather than the element itself. This means there is an unnecessary call to get() after using it (assuming the collection supports it).

So what is the Java way of doing this efficiently in both time and space?

like image 522
Bruce Adams Avatar asked Aug 01 '12 18:08

Bruce Adams


1 Answers

The Java way to do this is, yes, to use Map.

Coming from a C++ background this seems unnecessarily bloated. Why should I store the key again when thing already contains it?

This isn't as much overhead as you think. You're storing one extra reference to the String, for a total cost of...4 bytes. (Actually, the cost is zero: the TreeSet implementation takes exactly as much memory as TreeMap.)

If you want to search with both keys, you can use a Comparator<Thing> that compares both keys, or make Thing implement Comparable<Thing>, and then maintain a TreeSet<Thing>. That's much more compact than the...unpleasant Comparator you wrote above. If you want to search with one key, just use a Map<String, Thing>. If you really, really want to search with both, then maintain them both. (In practice, I've almost never had to do this...and the authors of the JDK Collections framework didn't think you'd need to do this very often, either.)

like image 61
Louis Wasserman Avatar answered Sep 21 '22 04:09

Louis Wasserman