Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient And Conditional Tuples or Subsets

Stepping back from the following question :

Selecting with Cases

I need to generate a random Set (1 000 000 items would be enough)

Subsets[Flatten[ParallelTable[{i, j}, {i, 1, 96}, {j, 1, 4}], 1], {4}]

Further, I need to reject any quadruples with non-unique first elements, such as {{1,1},{1,2},{2,3},{6,1}}.

But the above is impossible on a laptop. How could I just draw uniformly one millions sets avoiding killing my machine ?

like image 912
500 Avatar asked Oct 26 '11 13:10

500


3 Answers

Provided you have a base set you need to generate 4-element subsets of,

baseSet = Flatten[Table[{i, j}, {i, 1, 96}, {j, 1, 4}], 1];

you can use RandomSample as follows:

RandomSample[baseSet, 4]

This gives you a length-4 random subset of baseSet. Generating a million of them takes 2.5 seconds on my very old machine:

Timing[subsets = Table[RandomSample[baseSet, 4], {1000000}];]

Not all of what we get are going to be different subsets, so we need to remove duplicates using Union:

subsets = Union[subsets];

After this I'm still left with 999 971 items in a sample run, thanks to the much larger number of possible subsets (Binomial[Length[baseSet], 4] == 891 881 376)

like image 174
Szabolcs Avatar answered Oct 16 '22 01:10

Szabolcs


This should also do the trick, and it runs faster than Szabolcs' proposal.

(t=Table[{RandomInteger[{1, 96}], RandomInteger[{1, 4}]}, {10^6}, {4}]); //Timing

I saw no need to remove duplicate subsets since we're sampling, not trying to produce the entire population. (But you can easily remove duplicates if you so wish.)

BTW, for this case, Table runs faster than ParallelTable.

like image 34
DavidC Avatar answered Oct 16 '22 02:10

DavidC


I believe a slight variation of David's method will produce the duplicate-free form requested in the original post.

set = 
  With[{r = Range@96},
    {RandomSample[r, 4], RandomInteger[{1, 4}, 4]}\[Transpose] ~Table~ {1*^6}
  ];

This of course does not produce 10^6 unique samples, but Szabolcs showed how that may be done, and the cost is not great.

like image 37
Mr.Wizard Avatar answered Oct 16 '22 02:10

Mr.Wizard