Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

First N values of a Map<K, V> sorted by value

I have a list of Strings. I want to evaluate each string based on a function that returns a double. Then I want the first 5 strings, based on their calculated values. If there are fewer than 5, I want all of them (in order). Let's say the strings are chemical compounds and the function computes the mass. The function is computationally expensive; I need to evaluate it once per string. (I'm just making up data here, though.)

H2O => 18.5
C12H11O22 => 109.1
HeNe => 32.0
H2SO4 => 54.37
HCl => 19.11
4FeO3 => 82.39
Xe6 => 281.9

The program should return the first five strings arranged in order by their respective values. For this sample data: H20, HCl, HeNe, H2SO4, 4FeO3. Actually, I don't really care about the order; I just need the five lowest in any order.

I thought about how I'd do this in Perl. It's just a few lines:

foreach $s (@str) {
    $strmap{$s} = f($s);
}
@sorted = sort { $strmap{$a} <=> $strmap{$b} } keys %strmap;
return @sorted[0, 4]

But I need to do it in Java. And it's driving me crazy.

First I tried populating a HashMap<String, Double>, then using Collections.sort with a custom comparator, just like the Perl version. But scoping on the Comparator prevented it from referring to the HashMap to look up the values.

Then I tried a TreeMap<String, Double>, but it only sorts by key and no amount of coercing could get it to order the entries by value.

So I tried a TreeMap<Double, String>. It will discard entries with the same Double. However, the likelihood of having Strings that map to the same Double is low, so I pressed forward. Adding the entries to the TreeMap is no problem, but I ran into issues trying to extract the values from it.

TreeMap supplies a method called subMap, but its parameters are the keys that delimit the subset. I don't know what they are; I just want the first five of them. So I tried using the values method to get all the values out of the TreeMap, hoping they'd be in order. Then I can just get the first ten.

ArrayList<String> strs = (ArrayList<String>)(treemap.values());
return new ArrayList<String>(strs.subList(0, 5));

Nope. Runtime error: cannot cast TreeMap$Values to ArrayList.

List<String> strs = (List<String>)(treemap.values());
return new ArrayList<String>(strs.subList(0, 5));

Same. Runtime error trying to do the cast. OK, let's just assign to a Collection...

Collection<String> strs = treemap.values();
return new ArrayList<String>(strs.subList(0, 5));

Sorry, subList isn't a method of Collection.

Collection<String> strs = treemap.values();
ArrayList<String> a = new ArrayList<String>(strs);
return new ArrayList<String>(a.subList(0,  5));

Finally, something that works! But two extra data structures just to get the first five elements? And I'm not too wild about using Double as the key for TreeMap.

Is there a better solution?

like image 583
Barry Brown Avatar asked Apr 30 '13 09:04

Barry Brown


2 Answers

I don't think you'll get more compact than the three lines above, not in Java.

Apart from that, I have the impression that a Map as a data structure is the wrong choice in the first place, since you do not seem to need by-string lookups (UNLESS you want in some way deal with multiple occurences of strings, but you didn't say so). An alternative approach would be to declare your own comparable data record class:

private static class Record implements Comparable<Record> {
    // public final fields ok for this small example
    public final String string;
    public final double value;

    public Record(String string, double value) {
        this.string = string;
        this.value = value;
    }

    @Override
    public int compareTo(Record other) {
        // define sorting according to double fields
        return Double.compare(value, other.value); 
    }
}

// provide size to avoid reallocations
List<Record> records = new ArrayList<Record>(stringList.size());
for(String s : stringList)
    records.add(new Record(s, calculateFitness(s));
Collections.sort(records); // sort according to compareTo method
int max = Math.min(10, records.size()); // maximum index
List<String> result = new ArrayList<String>(max);
for(int i = 0; i < max; i++)
    result.add(records.get(i).string);
return result;

This is now much more verbose than the three lines above (this is Java, after all), but also includes the code that would be required to insert the key/value pairs into the map.

like image 94
misberner Avatar answered Oct 16 '22 21:10

misberner


Would something like the following work for you?

Note that I've assumed you don't require the double value other than to sort the data.

public static void main(String[] args) throws Exception {
  List<String> data = new ArrayList<>(Arrays.asList("t", "h", "i", "s", "i", "s", "t", "e", "s", "t", "d", "a", "t", "a"));

  Collections.sort(data, new Comparator<String>() {
    @Override
    public int compare(String o1, String o2) {
      double o1Value = evaluate(o1);
      double o2Value = evaluate(o2);
      return Double.compare(o1Value, o2Value);
    }
  });

  List<String> result = data.subList(0, 10); // Note the end point is exclusive

  for (String s : result) {
    System.out.println(s);
  }
}

private static double evaluate(String s) {
  return s.codePointAt(0); // Nonsense, I know
}

This example prints:

a
a
d
e
h
i
i
s
s
s
like image 1
Duncan Jones Avatar answered Oct 16 '22 21:10

Duncan Jones