Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimally picking one element from each list

I came across an old problem that you Mathematica/StackOverflow folks will probably like and that seems valuable to have on StackOverflow for posterity.

Suppose you have a list of lists and you want to pick one element from each and put them in a new list so that the number of elements that are identical to their next neighbor is maximized. In other words, for the resulting list l, minimize Length@Split[l]. In yet other words, we want the list with the fewest interruptions of identical contiguous elements.

For example:

pick[{ {1,2,3}, {2,3}, {1}, {1,3,4}, {4,1} }]
 --> {    2,      2,    1,     1,      1   }

(Or {3,3,1,1,1} is equally good.)

Here's a preposterously brute force solution:

pick[x_] := argMax[-Length@Split[#]&, Tuples[x]]

where argMax is as described here:
posmax: like argmax but gives the position(s) of the element x for which f[x] is maximal

Can you come up with something better? The legendary Carl Woll nailed this for me and I'll reveal his solution in a week.

like image 378
dreeves Avatar asked Apr 13 '11 06:04

dreeves


3 Answers

Not an answer, but a comparison of the methods proposed here. I generated test sets with a variable number of subsets this number varying from 5 to 100. Each test set was generated with this code

Table[RandomSample[Range[10], RandomInteger[{1, 7}]], {rl}]

with rl the number of subsets involved.

For every test set that was generated this way I had all the algorithms do their thing. I did this 10 times (with the same test set) with the algorithms operating in a random order so as to level out order effects and the effects of random background processes on my laptop. This results in mean timing for the given data set. The above line was used 20 times for each rl length, from which a mean (of means) and a standard deviation were calculated.

The results are below (horizontally the number of subsets and vertically the mean AbsoluteTiming):

enter image description here

It seems that Mr.Wizard is the (not so clear) winner. Congrats!


Update
As requested by Timo here the timings as a function of the number of distinct subset elements that can be chosen from as well as the maximum number of elements in each subset. The data sets are generated for a fixed number of subsets (50) according to this line of code:

lst = Table[RandomSample[Range[ch], RandomInteger[{1, ch}]], {50}];

I also increased the number of datasets I tried for each value from 20 to 40.

enter image description here


Here for 5 subsets:

enter image description here

like image 142
Sjoerd C. de Vries Avatar answered Oct 14 '22 00:10

Sjoerd C. de Vries


I'll toss this into the ring. I am not certain it always gives an optimal solution, but it appears to work on the same logic as some other answers given, and it is fast.

f@{} := (Sow[m]; m = {i, 1})
f@x_ := m = {x, m[[2]] + 1}

findruns[lst_] :=
  Reap[m = {{}, 0}; f[m[[1]] ⋂ i] ~Do~ {i, lst}; Sow@m][[2, 1, 2 ;;]]

findruns gives run-length-encoded output, including parallel answers. If output as strictly specified is required, use:

Flatten[First[#]~ConstantArray~#2 & @@@ #] &

Here is a variation using Fold. It is faster on some set shapes, but a little slower on others.

f2[{}, m_, i_] := (Sow[m]; {i, 1})
f2[x_, m_, _] := {x, m[[2]] + 1}

findruns2[lst_] :=
  Reap[Sow@Fold[f2[#[[1]] ⋂ #2, ##] &, {{}, 0}, lst]][[2, 1, 2 ;;]]
like image 44
Mr.Wizard Avatar answered Oct 13 '22 23:10

Mr.Wizard


This is my take on it, and does pretty much the same thing as Sjoerd, just in a less amount of code.

LongestRuns[list_List] := 
 Block[{gr, f = Intersection}, 
  ReplaceRepeated[
    list, {a___gr, Longest[e__List] /; f[e] =!= {}, b___} :> {a, 
      gr[e], b}] /. 
   gr[e__] :> ConstantArray[First[f[e]], Length[{e}]]]

Some gallery:

In[497]:= LongestRuns[{{1, 2, 3}, {2, 3}, {1}, {1, 3, 4}, {4, 1}}]

Out[497]= {{2, 2}, {1, 1, 1}}

In[498]:= LongestRuns[{{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10, 
   2, 8, 5, 9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8, 
   7}, {6, 9, 4, 5}}]

Out[498]= {{3, 3, 3, 3}, {1}, {9, 9, 9}}

In[499]:= pickPath[{{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10, 2, 
   8, 5, 9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8, 
   7}, {6, 9, 4, 5}}]

Out[499]= {{10, 10, 10, 10}, {{1}, {9, 9, 9}}}

In[500]:= LongestRuns[{{2, 8}, {4, 2}, {3}, {9, 4, 6, 8, 2}, {5}, {8, 
   10, 6, 2, 3}, {9, 4, 6, 3, 10, 1}, {9}}]

Out[500]= {{2, 2}, {3}, {2}, {5}, {3, 3}, {9}}

In[501]:= LongestRuns[{{4, 6, 18, 15}, {1, 20, 16, 7, 14, 2, 9}, {12, 
   3, 15}, {17, 6, 13, 10, 3, 19}, {1, 15, 2, 19}, {5, 17, 3, 6, 
   14}, {5, 17, 9}, {15, 9, 19, 13, 8, 20}, {18, 13, 5}, {11, 5, 1, 
   12, 2}, {10, 4, 7}, {1, 2, 14, 9, 12, 3}, {9, 5, 19, 8}, {14, 1, 3,
    4, 9}, {11, 13, 5, 1}, {16, 3, 7, 12, 14, 9}, {7, 4, 17, 18, 
   6}, {17, 19, 9}, {7, 15, 3, 12}, {19, 12, 5, 14, 8}, {1, 10, 12, 
   8}, {18, 16, 14, 19}, {2, 7, 10}, {19, 2, 5, 3}, {16, 17, 3}, {16, 
   2, 6, 20, 1, 3}, {12, 18, 11, 19, 17}, {12, 16, 9, 20, 4}, {19, 20,
    10, 12, 9, 11}, {10, 12, 6, 19, 17, 5}}]

Out[501]= {{4}, {1}, {3, 3}, {1}, {5, 5}, {13, 13}, {1}, {4}, {9, 9, 
  9}, {1}, {7, 7}, {9}, {12, 12, 12}, {14}, {2, 2}, {3, 3}, {12, 12, 
  12, 12}}

EDIT given that Sjoerd's Dreeves's brute force approach fails on large samples due to inability to generate all Tuples at once, here is another brute force approach:

bfBestPick[e_List] := Block[{splits, gr, f = Intersection},
  splits[{}] = {{}};
  splits[list_List] := 
   ReplaceList[
    list, {a___gr, el__List /; f[el] =!= {}, 
      b___} :> (Join[{a, gr[el]}, #] & /@ splits[{b}])]; 
  Module[{sp = 
     Cases[splits[
        e] //. {seq__gr, 
         re__List} :> (Join[{seq}, #] & /@ {re}), {__gr}, Infinity]}, 
   sp[[First@Ordering[Length /@ sp, 1]]] /. 
    gr[args__] :> ConstantArray[First[f[args]], Length[{args}]]]]

This brute-force-best-pick might generate different splitting, but it is length that matters according to the original question.

test = {{4, 6, 18, 15}, {1, 20, 16, 7, 14, 2, 9}, {12, 3, 15}, {17, 6,
     13, 10, 3, 19}, {1, 15, 2, 19}, {5, 17, 3, 6, 14}, {5, 17, 
    9}, {15, 9, 19, 13, 8, 20}, {18, 13, 5}, {11, 5, 1, 12, 2}, {10, 
    4, 7}, {1, 2, 14, 9, 12, 3}, {9, 5, 19, 8}, {14, 1, 3, 4, 9}, {11,
     13, 5, 1}, {16, 3, 7, 12, 14, 9}, {7, 4, 17, 18, 6}, {17, 19, 
    9}, {7, 15, 3, 12}, {19, 12, 5, 14, 8}, {1, 10, 12, 8}, {18, 16, 
    14, 19}, {2, 7, 10}, {19, 2, 5, 3}, {16, 17, 3}, {16, 2, 6, 20, 1,
     3}, {12, 18, 11, 19, 17}, {12, 16, 9, 20, 4}, {19, 20, 10, 12, 9,
     11}, {10, 12, 6, 19, 17, 5}};

pick fails on this example.

In[637]:= Length[bfBestPick[test]] // Timing

Out[637]= {58.407, 17}

In[638]:= Length[LongestRuns[test]] // Timing

Out[638]= {0., 17}

In[639]:= 
Length[Cases[pickPath[test], {__Integer}, Infinity]] // Timing

Out[639]= {0., 17}

I am posting this in case somebody might want to search for counterexamples that the code like pickPath or LongestRuns does indeed generate a sequence with smallest number of interruptions.

like image 27
Sasha Avatar answered Oct 13 '22 22:10

Sasha