I'm looking for an efficient way of storing an ordered list/set of items where:
Performance would ideally be biased toward retrieval of the first N items of any subset (or merged subset), and storage would be in-memory (and maybe eventually persistent on disk)
I am a new member to this forum, I hope you have not forgotten about this old question :)
Store the master set in an indexed data structure --such as an array (or arraylist, if your library supports it). Let's assume you can associate an id with each set (if not, then how do you know which set to retrieve?). So, we now need a way to find out which elements of your array participate in that set and which ones don't.
Use a matrix (n x m)
with n
being the number of elements in your array and m
being the initial number of sets. i refers to the row index and j refers to the column index.
A[i][j] = 0 if ith element is not in jth set
A[i][j] = 1 if ith element is in jth set
Do not use a simple two dimensional array, go for an ArrayList<ArrayList>
. Java/C#/C++ support such generic constructs, but it shouldn't be terribly hard to do so in other languages such as Perl. In C# you can even use a DataTable
.
You can add a new set in O(n)
time. Simply add a new column for that set and set the appropriate rows to 1 for that column. There will be no need to sort this set as long as the original array is sorted.
In a simple sorted array, time for insertion is O(log n)
. In our case, we will first add element to the array (and at whatever index we added the element at, the matrix will also get an all 0
row at that index). Then we set entries in that column to 1 if the element belongs to a set. That way, the worst case runtime becomes O(log n) + O(m)
.
Pick up the column corresponding to the set in O(1)
time and then pick the first N
entries that are 1
. This will be linear.
Let's say we are merging sets at j1 and j2 into a third set at j3.
for (int i = 0; i < n - 1; i++) {
A[i][j3] = A[i][j1] | A[i][j2];
}
This is again linear.
First find the element in the master array --this takes O(log n)
time. Then remove it from that array and remove row at that index from the matrix.
Don't simply remove, just mark them defunct. Upon a threshold number of defunct'd columns/rows, you can consolidate. Similarly, start with a high capacity initially for the arrays. Modern implementations should do this automatically though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With