Any method about any numerical methods you know which may be relevant, please post it here!
I have an array of values
for each set, and the index to each value corresponds to the set the value is bound to, therefore I represent a set as an integer, where elements represent the bit position, E.g. a set with element one in it is represented as ...001
where 1
is the LSB
.
So the set is only an index and never stored, it is generated on the fly, it is the key that leads to the index in the array that represent values of sets.
What I do is given a set, is the summed value for any of the pairwise disjoint subset greater than the value for that set. E.g. if set 0111
have a value of 3, where two subsets have the value of 0100 = 2
and 0011 = 2
, then this splitting is more beneficial to do. I do this for all subsets of the set.
Given three agents and the ordering is the sets number representation.
val[8] = {0,1,2,4,3,2,4,2} the values is not important, only how they are ordered
0 0 0 0 1 1 1 1 MSB bit representation of the index
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 LSB
Best splitting of 111 is 011 and 100 with a sum of 7. So to get the value of the set which contain only the first element, ergo 001, you put val[1], for set with element 1 and 3(101), you put val[5].
How the val array is ordered when grouped by cardinality
val[8] = {0,1,2,3,4,2,4,2}
0 0 0 1 0 1 1 1 MSB bit representation of the index
0 0 1 0 1 0 1 1
0 1 0 0 1 1 0 1 LSB
Here you have to translate the index to the right bin in the array, so it would look like this for a set with only the third element in it(100), val[translate(4)]. Think arrays of size >2^25 elements. Look at Improving random memory access when random access is needed for further clarification.
However, this results in a high order of random access in the memory, even if I group them after cardinality. Currently grouping them by cardinality, and generating an index is slower than ordering them after the number the set represents.
The way I generate an index with the sets grouped by cardinality is by using pascals triangle in constant memory as described by the answer in Determin the lexicographic distance between two integers
n index 1 2 4 8 3 5 6 9 10 12 7 11 13 14 15
-----------------------------------------------------
MSB 0 0 0 1 | 0 0 0 1 1 1 | 0 1 1 1 | 1
0 0 1 0 | 0 1 1 0 0 1 | 1 0 1 1 | 1
0 1 0 0 | 1 0 1 0 1 0 | 1 1 0 1 | 1
LSB 1 0 0 0 | 1 1 0 1 0 0 | 1 1 1 0 | 1
n index represent the index it would have if not ordered in cardinality. This is just to show where the value for each set is located.
The integer set represent an index in the value array, either through direct index(what I am currently doing, gives random access) or through a translation from the set to an index.
Instead of splitting a set into subsets, I though of generating the sets bottom up. E.g. instead of splitting 0111
to all of its pairwise disjoint subsets, I would at some point generate if from the sets {0100,0011},{0010,0101},{0001,0110}
.
Say we want to evaluate all the splittings of the sets with a cardinality of 3, ergo sets 7,11,13,14
. As the only way to split a set of cardinality 3 is by splitting into sets of cardinality 1 and 2, we need to evaluate if the sum of any of all the disjoint subsets of cardinality 1 and 2 is greater than the union of those sets.
Notation of what is required(may be a little flawed):
|C|=n,∀ a,b : a ∪ b = C , a ∩ b ={Ø}, |a|+|b| = n
So by reading in the values using coalesced memory access to each thread, for each subsets that form a set of cardinality n, check if it its value is greater than the formed set, if so, update the value.
Simple example, if n = 2
then you should read in all values with cardinality 1, and do all combinations of those sets and update accordingly. This example is easy as all sets are disjoint:
pseudo code for 4 threads, input card1 is pointer to array of sets |s| =1
__shared__ int value[4];
tid = threadIdx.x;
value[tid] = card1[tid]; // coalesced memory access
int thvalue = value[tid]; // holds the value for the thread, to avoid bank conflict
int rvalue[blockDim.x/2]= 0; //holds the sum
int i = blockDim.x;
int x = 0;
//reduction loop that dont generate duplicate sets
for(;i>0;i>>=1) {
if(tid < i) {
x++;
rvalue[x-1] = value[(tid+x)%blockDim.x] + thvalue;
}
}
for(i = 0; i < x; i++) {
int index = getindex(tid,i,1); //gets the index for the set it generated, 1 represent the cardinality
if(output[index] < rvalue[i])
output[index] = rvalue[i];
}
Iteration of the reduction loop
Thread set specific for thread first iteration second iteration
0 0001 0001 + 0010 0001 + 0100
1 0010 0010 + 0100 0010 + 1000
2 0100 0100 + 1000 none
3 1000 1000 + 0001 none
As you see, it have fetched all the values for all the subset that form sets of cardinality 2.
The problem is however that generating sets of cardinality greater than 2 is more trickier, due to not all sets are disjoint. E.g. 0001 and 0011 are not disjoint.
Keep in mind that I do not store the sets anywhere, only the value for the sets.
How would you go about, having this in mind, creating an algorithm that reads in the memory coalesced, and generating all sets from disjoint subsets. Without checking whether the subsets are disjoint, it should be completely deterministic.
The algorithm, should be either be described text with distinct steps marked out, or pseudo code.
It should be proven with examples that it works. Not that this algorithm goes up to n^32 sets, so it need to scale well.
The algorithm is allowed to be spitted to two or more instances, E.g. one for even number and one for odd.
I would gladly be referred to sources about the technique you use.
The algorithm should use as few assignments and instructions as possible and should avoid any divergence. But if you think you got one even-though you have a lot of this, try and post, I will be happy with any information.
If it is ordered in another way but it still works as I have described, I urge you to please post it here, any help is really helpful
Please ask if there is anything unclear.
I have an array Z
with values, the index i
as in Z[i]
represent an integer set, depending on the ordering of Z
, The values is grouped by cardinality, and ordered by binary lexicographical permutation -> the position the sets value is located 1,2,4,3,5,6,7 <- so I use an function(I have this function implemented) to translate the index to the correct index. E.g. Set 3-> index 4.
By having the values for the set grouped by cardinality, what I want is, want to see if any of the pairwise disjoint sets value is greater than the set they form.
E.g. |a| = 3, |b|+|c| =3, b ∩ c ={Ø}, |b| =1
So reading in X
amount of values of type b
, and X
amount of values from type c
, find all the disjoint subsets of b
and c
that from type a
(sets of cardinality 3) and get their sum. Continue until all the sets have been "generated"
Hamming weight based indexing
Determin the lexicographic distance between two integers
Improving random memory access when random access is needed
I don't know whether this will help you or not, but I found a branchless count-all-the-1-bits-in-a-word function in Hacker's Delight that seems like it might be useful in helping you determine the cardinality of a set:
int pop(unsigned int x) {
x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = x + (x >> 8);
x = x + (x >> 16);
return x & 0x0000003F;
}
In the text, Warren claims that the above sequence can be compiled down to as little as 21 instructions. However, using MSVC 2010 on an i7 dev machine I checked the disassembly for this function and found that it clocked in at around 22 instructions for the actual computation and 33 instructions in total (counting stack ops). On a modern CPU or GPU, it should be pretty fast, since it has no branching.
For terminology I am calling "value" your set valuation function and "target" your target function which is max of sum of values over every binary partition.
Every splitting of a binary number B into two disjoint parts, L and R, can be represented by a ternary number C, where
B = L | R (bitwise OR)
L ^ R = 0 (bitwise XOR)
C[i] = 0 means B[i] = 0 and L[i] = R[i] = 0
C[i] = 1 means B[i] = 1 and L[i] = 1
C[i] = 2 means B[i] = 2 and R[i] = 1
Then "simply" enumerate numbers from 1 to 3**n in ternary: eg (n=3): 000, 001, 002, 010, 011, 012, 020, ...
OK, actually, efficiently counting in ternary is not completely trivial when all you have at hand is binary. But bear with me, I will explain that bit after going over the high level algo ...
So you count in ternary in order, and given a ternary number C, you obtain L and R - how? I'll explain that below too, trust me :)
Given L and R you can now look up your valuation at L and R and update the target at B: target[B] = max(val[L], val[R]).
OK that's the high-level algo. I can't prove it on such short notice but it does seem like it has very good cache locality properties. In other words value[L] and value[R] will tend to stay in a small number of cache lines at a time.
Also I think the best bet for parallelizing is to split i
into values modulo 3, or values modulo 9, etc.
efficient ternary counting in binary
How can we count in ternary efficiently? Try the following: Count in base 4, and skip some.
In other words a ternary digit will be represented by two bits, and we will disallow the combination 11
.
repr | value
0 0 | 0
0 1 | 1
1 0 | 2
1 1 | *undefined*
Now, how do we efficiently know when to skip ? Well, the pattern of increments is easy enough to figure out:
1 1 2 1 1 2 1 1 6 1 1 2 1 1 2 1 1 6 1 1 2 1 1 2 1 1 22 1 1 2 ...
My suggestion would be to precaclulate a large chunk of size a power of 3 (eg 3 ** 7 = 2187) and compute the nth power of 3 on the fly once in a while [hint : it's related to cubes of n ..].
So you start with 00.00.00. You add 1 that's 00.00.01. You add 1 that's 00.00.10. Now you have to add 2 in order to skip the 11 combination, that leaves you with 00.01.00. Etc.
how to obtain L and R from C
Now C in our quaternary representation of ternary is actually simply L and R interleaved. To get L and R back efficiently, you can check the answer to this S/O question or apply other bit twiddling hacks.
afterthought
All in all, I'm not sure whether we've really been using base 3 or base 4. Oh well ...
Have fun, and good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With