I hope that this question is specific enough to be deemed fit for StackOverflow. I checked the FAQ and I think this qualifies, since it is specific and related to programming.
I'm implementing a complex data mining algorithm (FP-growth) in Java. Some of the initial phases of the algorithm require me to scan a large database and keep a running count of each item type found. This seems perfectly suited to a Hashbag
interface. I found one in Apache Commons which seems to work for me.
So now, my HashBag is filled with [itemType, count] entries (pairs). Later on in the algorithm, I'm required to do a lot of list-like operations on these pairs. In some cases, I must sort the collection by itemType. In others, I must sort by count. This seems perfectly suited to a List
interface.
I'm left with the conclusion that I must convert my Hasbag to a List. Yet it feels dirty somehow, like a waste of space and time. Is there a smarter way to do this, or is it a common situation to have a programming problem where you must treat your collection differently at different times, and conversions are a necessary evil?
One alternative is to make my own interface which is truly a list, but allows "bag-style" adds. I'd have to keep the list sorted and perform binary searches with a custom comparator every time I wanted to add something. Building that collection would probably take longer than building a Hashbag, but I'd save on the conversion step at the end. Any thoughts as to which is preferable?
Thanks!
If you used Guava's Multiset
instead of Apache's Bag
-- roughly analogous, but in a different style -- you can do most of this without converting. Multiset.entrySet()
returns a Set<Entry<E>>
, with Entry<E>
effectively representing a pair of an element and a count -- that sounds like it's probably the best way to address your need to operate on the element-count pairs, maybe? You can iterate over that like you'd iterate over a Map.entrySet()
.
You can use Multisets.copyHighestCountFirst(Multiset)
to get a multiset reordered in highest-frequency-first order, and use TreeMultiset
to order by the elements directly.
(Disclosure: I contribute to Guava.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With