Data structure for set of (non-disjoint) sets

Tags:

I'm looking for a data structure that roughly corresponds to (in Java terms) Map<Set<int>, double>. Essentially a set of sets of labeled marbles, where each set of marbles is associated with a scalar. I want it to be able to efficiently handle the following operations:

Add a given integer to every set.
Remove every set that contains (or does not contain) a given integer, or at least set the associated double to 0.
Union two of the maps, adding together the doubles for sets that appear in both.
Multiply all of the doubles by a given double.
Rarely, iterate over the entire map.

under the following conditions:

The integers will fall within a constrained range (between 1 and 10,000 or so); the exact range will be known at compile-time.
Most of the integers within the range (80-90%) will never be used, but which ones will not be easily determinable until the end of the calculation.
- The number of integers used will almost always still be over 100.
Many of the sets will be very similar, differing only by a few elements.
It may be possible to identify certain groups of integers that frequently appear only in sequential order: for example, if a set contains the integers 27 and 29 then it (almost?) certainly contains 28 as well.
- It may be possible to identify these groups prior to running the calculation.
- These groups would typically have 100 or so integers.

I've considered tries, but I don't see a good way to handle the "remove every set that contains a given integer" operation.

The purpose of this data structure would be to represent discrete random variables and permit addition, multiplication, and scalar multiplication operations on them. Each of these discrete random variables would ultimately have been created by applying these operations to a fixed (at compile-time) set of independent Bernoulli random variables (i.e. each takes the value 1 or 0 with some probability).

The systems being modeled are close to being representable as a time-inhomogeneous Markov chains (which would of course simplify this immensely) but, unfortunately, it is essential to track the duration since various transitions.

663

asked Apr 02 '14 23:04

Alex Godofsky

1 Answers

Here's a data structure, that can do all of your operations pretty efficiently:

I'm going to refer to it as a BitmapArray for this explanation.

Thinking about it, apparently for just the operations you have described a sorted array with bitmaps as keys and weights(your doubles) as values will be pretty efficient.

The bitmaps are what maintain membership in your set. Since you said the range of integers in the set are between 1-10,000, we can maintain information about any set with a bitmap of length 10,000.

It's gonna be tough sorting an array where the keys can be as big as 2^10000, but you can be smart about implementing the comparison function in the following way:

Iterate from left to right on the two bitmaps
XOR the bits on each index
Say you get a 1 at ith position
Whichever bitmap has 1 at ith position is greater
If you never get a 1 they're equal

I know this is still a slow comparison. But not too slow, Here's a benchmark fiddle I did on bitmaps with length 10000. This is in Javascript, if you're going to write in Java, it's going to perform even better.

    function runTest() {
    var num = document.getElementById("txtValue").value;
    num = isNaN(num * 1) ? 0 : num * 1;

    /*For integers in the range 1-10,000 the worst case for comparison are any equal integers which will cause the comparision to iterate over the whole BitArray*/
    bitmap1 = convertToBitmap(10000, num);
    bitmap2 = convertToBitmap(10000, num);

    before = new Date().getMilliseconds();
    var result = firstIsGreater(bitmap1, bitmap2, 10000);
    after = new Date().getMilliseconds();
    alert(result + " in time: " + (after-before) + " ms");

}


function convertToBitmap(size, number) {
    var bits = new Array();
    var q = number;
    do {
        bits.push(q % 2);
        q = Math.floor(q / 2);
    } while (q > 0);


    xbitArray = new Array();
    for (var i = 0; i < size; i++) {
        xbitArray.push(0);
    }

    var j = xbitArray.length - 1;
    for (var i = bits.length - 1; i >= 0; i--) {
        xbitArray[j] = bits[i];
        j--
    }
    return xbitArray;
}

function firstIsGreater(bitArray1, bitArray2, lengthOfArrays) {
    for (var i = 0; i < lengthOfArrays; i++) {
        if (bitArray1[i] ^ bitArray2[i]) {
            if (bitArray1[i]) return true;
            else return false;
        }
    }
    return false;
}

document.getElementById("btnTest").onclick = function (e) {
    runTest();
};

Also, remember that you only have to do this once, when building your BitmapArray (or while taking unions) and then it's going to become pretty efficient for the operations you'd do most often:

Note: N is the length of the BitmapArray.

Add integer to every set: Worst/best case O(N) time. Flip a 0 to 1 in each bitmap.

Remove every set that contains a given integer: Worst case O(N) time.

For each bitmap check the bit that represents the given integer, if 1 mark it's index.
Compress the array by deleting all marked indices.

If you're okay with just setting the weights to 0 it'll be even more efficient. This also makes it very easy if you want to remove all sets that have any element in a given set.

Union of two maps: Worst case O(N1+N2) time. Just like merging two sorted arrays, except you have to be smart about comparisons once more.

Multiply all of the doubles by a given double: Worst/best case O(N) time. Iterate and multiply each value by the input double.

Iterate over the BitmapArray: Worst/best case O(1) time for next element.

answered Oct 14 '22 05:10

aa333

Related questions
                            
                                regexp-like library for matrix pattern search
                            
                                algorithm for a random space bordered by elements of equal length
                            
                                C++ Graph Vertex Coloring Library or Source Code
                            
                                Diagram connector algorithm
                            
                                Minimal Distance Hamiltonian Path Javascript
                            
                                Algorithm for calculating probabilities of a number being drawn opening a book
                            
                                How do I create a random path?
                            
                                An algorithm on mathematica to calculate the determinant of a n*n matrix:
                            
                                0-1 Knapsack w/ partitioning constraints
                            
                                Merging approximately equal points in dataset
                            
                                Programmatically arrange rectangular UI objects in an abstract way, without gaps
                            
                                Algorithm to find maximum sum of elements in an array such that not more than k elements are adjacent
                            
                                Optimal retransmission algorithm for a broadcast channel
                            
                                An ordered dictionary supporting decrease-key?
                            
                                Do any array based, bounded, wait free stacks exist?
                            
                                Algorithm for hours occupied in day by tasks
                            
                                Molecular dynamics simulation: fluctuating dipole model implementation
                            
                                Algorithm for Human Towering
                            
                                Is there an efficient way to cluster a graph according to Jaccard similarity?
                            
                                Get the delta of two javascript objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Data structure for set of (non-disjoint) sets

Tags:

algorithm

data-structures

Alex Godofsky

People also ask

1 Answers

aa333

Recent Activity

Donate For Us