Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it considered bad form to convert between collection types?

I hope that this question is specific enough to be deemed fit for StackOverflow. I checked the FAQ and I think this qualifies, since it is specific and related to programming.

I'm implementing a complex data mining algorithm (FP-growth) in Java. Some of the initial phases of the algorithm require me to scan a large database and keep a running count of each item type found. This seems perfectly suited to a Hashbag interface. I found one in Apache Commons which seems to work for me.

So now, my HashBag is filled with [itemType, count] entries (pairs). Later on in the algorithm, I'm required to do a lot of list-like operations on these pairs. In some cases, I must sort the collection by itemType. In others, I must sort by count. This seems perfectly suited to a List interface.

I'm left with the conclusion that I must convert my Hasbag to a List. Yet it feels dirty somehow, like a waste of space and time. Is there a smarter way to do this, or is it a common situation to have a programming problem where you must treat your collection differently at different times, and conversions are a necessary evil?

One alternative is to make my own interface which is truly a list, but allows "bag-style" adds. I'd have to keep the list sorted and perform binary searches with a custom comparator every time I wanted to add something. Building that collection would probably take longer than building a Hashbag, but I'd save on the conversion step at the end. Any thoughts as to which is preferable?

Thanks!

like image 792
The111 Avatar asked Nov 02 '12 02:11

The111


1 Answers

If you used Guava's Multiset instead of Apache's Bag -- roughly analogous, but in a different style -- you can do most of this without converting. Multiset.entrySet() returns a Set<Entry<E>>, with Entry<E> effectively representing a pair of an element and a count -- that sounds like it's probably the best way to address your need to operate on the element-count pairs, maybe? You can iterate over that like you'd iterate over a Map.entrySet().

You can use Multisets.copyHighestCountFirst(Multiset) to get a multiset reordered in highest-frequency-first order, and use TreeMultiset to order by the elements directly.

(Disclosure: I contribute to Guava.)

like image 60
Louis Wasserman Avatar answered Sep 22 '22 05:09

Louis Wasserman