Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: Sorted collection which allows duplicates, is memory efficient and provides fast insert + update

Tags:

Specifically I need a collection which uses one field A for accessing and a different one (field S) for sorting but a sorted collection which accepts duplicate would be sufficient.

I often come to this point where I need exactly this collection and TreeMap is not an option as it does not allow duplicates. So now it is time to ask here. There are several workarounds as pointed out on stackoverflow here and here - namely there are:

  • PriorityQueue: slow update (remove(Object) + add(Object)), and boxing of primitive keys
  • Fibonacci heap: memory waste (?)
  • TreeMap<Field_S, List<Value>>: problem for me is the memory overhead of the list, and boxing of primitive keys
  • sorted list or array: problem is the slow insert and remove -> should I implement one segmented sorted list?
  • TreeMultimap from guava (docs): external dependency and probably memory inefficient (?)

Anyone with better suggestions? Or should I role my own sorted datastructure (which one?)? Also other sources (in Java, open source, with unit tests and small deps) would be nice.


Update

More details on my use case at the moment (although I'm having similar demand in the last time). I have a collection (with millions) of references where I want to be able

  • to poll or get the smallest element regarding field S
  • and update field S with the help of field A
  • identical values of field S can happen. field A is actually a integer pointing into another array
  • the only dependency I want is trove4j. I could use a different like the mahout collections if that would be required. But not guava as although a nice lib the collections are not tuned to be memory efficient (boxing/unboxing).

So all cries for a fibonacci heap but I fear it has too many overhead per element -> that was the reason I thought about a more memory efficient "sorted+segmented array" solution.

like image 265
Karussell Avatar asked Oct 10 '12 20:10

Karussell


People also ask

Does sorted set allow duplicates Java?

Remarks. The SortedSet<T> class does not accept duplicate elements. If item is already in the set, this method returns false and does not throw an exception.

Which is the sorted collection in Java?

sort() method is present in java. util. Collections class. It is used to sort the elements present in the specified list of Collection in ascending order.

Is there a collection which keeps the data sorted and accept duplicate values?

Java: Sorted collection which allows duplicates, is memory efficient and provides fast insert + update.

Can sorted list have duplicate values?

A SortedList does not allow duplicate keys.


1 Answers

When you need a sorted collection, you should analyze your needs carefully.
If the majority of operations is inserting and only a few are to search then using a sorted collection i.e. keep the elements sorted in the collection constantly, would not be a good option (due to the overhead of keeping the elements sorted on insert which would be the most common operation).
In this case it would be best to keep an unsorted collection and do the sorting only when needed. I.e. before the search. You could even use a simple List and sort it (using Collections.sort i.e. mergesort) when needed. But I recommend this with caution, as for this to be efficient the assumption is that you work on large data. In really small data even linear search is good enough.

If the majority of operations is searching then you could use a sorted collection which from my of point of view there are data structures to choose from (some you already mention) and you could benchmark to see which one fits your needs.

like image 157
Cratylus Avatar answered Sep 28 '22 03:09

Cratylus