Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing a sublist with first item in sublist

I'm pretty new to Mathematica and am stumped by this problem. I have a list that looks like this:

{{1, 1, 1}, {0}, {1}}

I want to replace each sublist with its first element. So, the above list should be converted to:

{1,0,1}

I've looked through the documentation repeatedly and Googled for hours. I'm sure that this is fairly simple, but I can't figure it out. I started with this list:

{1, 1, 1, 0, 1}

I need to know how many runs of 1's there are, which is obviously 2. So, I used Split to separate the list into groups of consecutive 1's and 0's. By using Length on this list I can get the total number of runs, which is 3. Now, I just need to calculate the number of runs of 1's. If I can convert the list as mentioned above, I can just sum the items in the list to get the answer.

I hope that makes sense. Thanks for any help!

like image 815
Tim Mayes Avatar asked Nov 26 '11 00:11

Tim Mayes


2 Answers

The proposed solutions are pretty fast, however if you want extreme efficiency (huge lists), here is another one which would be order of magnitude faster (formulated as a pure function):

Total[Clip[Differences@#,{0, 1}]] + First[#] &

For example:

In[86]:= 
largeTestList = RandomInteger[{0,1},{10^6}];
Count[Split[largeTestList],{1..}]//Timing
Count[Split[largeTestList][[All,1]],1]//Timing
Total[Clip[Differences@#,{0, 1}]] + First[#] &@largeTestList//Timing

Out[87]= {0.328,249887}
Out[88]= {0.203,249887}
Out[89]= {0.015,249887}

EDIT

I did not indend to initiate the "big shootout", but while we are at it, let me pull the biggest gun - compilation to C:

runsOf1C = 
 Compile[{{lst, _Integer, 1}},
   Module[{r = Table[0, {Length[lst] - 1}], i = 1, ctr = First[lst]},
    For[i = 2, i <= Length[lst], i++,
      If[lst[[i]] == 1 && lst[[i - 1]] == 0, ctr++]];
      ctr],
  CompilationTarget -> "C", RuntimeOptions -> "Speed"]

Now,

In[157]:= 
hugeTestList=RandomInteger[{0,1},{10^7}];
Total[Clip[ListCorrelate[{-1,1},#],{0,1}]]+First[#]&@hugeTestList//AbsoluteTiming
runsOf1C[hugeTestList]//AbsoluteTiming

Out[158]= {0.1872000,2499650}
Out[159]= {0.0780000,2499650}

Of course, this is not an elegant solution, but it is straightforward.

EDIT 2

Improving on the optimization of @Sjoerd, this one will be about 1.5 faster than runsOf1C still:

runsOf1CAlt = 
Compile[{{lst, _Integer, 1}},
  Module[{r = Table[0, {Length[lst] - 1}], i = 1, ctr = First[lst]},
    For[i = 2, i <= Length[lst], i++,
     If[lst[[i]] == 1,
      If[lst[[i - 1]] == 0, ctr++];
      i++
     ]];
    ctr],
  CompilationTarget -> "C", RuntimeOptions -> "Speed"]
like image 72
Leonid Shifrin Avatar answered Sep 27 '22 21:09

Leonid Shifrin


You have actually two questions, the one from the title and the question lurking behind it. The first one is answered by:

First/@ list

The second one, counting the number of runs of 1's, has been answered many times, but this solution

Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &

is about 50% faster than Leonid's solution. Note I increased the length of the test list for better timing:

largeTestList = RandomInteger[{0, 1}, {10000000}];
Count[Split[largeTestList], {1 ..}] // AbsoluteTiming
Count[Split[largeTestList][[All, 1]], 1] // AbsoluteTiming
Total[Clip[Differences@#, {0, 1}]] + First[#] &@ largeTestList // AbsoluteTiming
(Tr@Unitize@Differences@# + Tr@#[[{1, -1}]])/2 &@ largeTestList // AbsoluteTiming
Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &@
  largeTestList // AbsoluteTiming


Out[680]= {3.4361965, 2498095}

Out[681]= {2.4531403, 2498095}

Out[682]= {0.2710155, 2498095}

Out[683]= {0.2530145, 2498095}

Out[684]= {0.1710097, 2498095}

After Leonid's compilation attack I was about to throw in the towel, but I spotted a possible optimization, so onwards goes the battle... [Mr.Wizard, Leonid and I should be thrown in jail for disturbing the peace on SO]

runsOf1Cbis = 
 Compile[{{lst, _Integer, 1}}, 
  Module[{r = Table[0, {Length[lst] - 1}], i = 1, ctr = First[lst]}, 
   For[i = 2, i <= Length[lst], i++, 
    If[lst[[i]] == 1 && lst[[i - 1]] == 0, ctr++; i++]];
   ctr], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

largeTestList = RandomInteger[{0, 1}, {10000000}]; 
Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &@
    largeTestList // AbsoluteTiming
runsOf1C[largeTestList] // AbsoluteTiming
runsOf1Cbis[largeTestList] // AbsoluteTiming


Out[869]= {0.1770101, 2500910}

Out[870]= {0.0960055, 2500910}

Out[871]= {0.0810046, 2500910}

The results vary, but I get an improvement between 10 and 30%.

The optimization may be hard to spot, but it's the extra i++ if the {0,1} test succeeds. You can't have two of these in successive locations.


And, here, an optimization of Leonid's optimization of my optimization of his optimization (I hope this isn't going to drag on, or I'm going to suffer a stack overflow):

runsOf1CDitto = 
 Compile[{{lst, _Integer, 1}}, 
  Module[{i = 1, ctr = First[lst]}, 
   For[i = 2, i <= Length[lst], i++, 
    If[lst[[i]] == 1, If[lst[[i - 1]] == 0, ctr++];
     i++]];
   ctr], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

largeTestList = RandomInteger[{0, 1}, {10000000}]; 
Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &@
  largeTestList // AbsoluteTiming
runsOf1C[largeTestList] // AbsoluteTiming
runsOf1Cbis[largeTestList] // AbsoluteTiming
runsOf1CAlt[largeTestList] // AbsoluteTiming
runsOf1CDitto[largeTestList] // AbsoluteTiming


Out[907]= {0.1760101, 2501382}

Out[908]= {0.0990056, 2501382}

Out[909]= {0.0780045, 2501382}

Out[910]= {0.0670038, 2501382}

Out[911]= {0.0600034, 2501382}

Lucky for me, Leonid had a superfluous initialization in his code that could be removed.

like image 23
Sjoerd C. de Vries Avatar answered Sep 27 '22 20:09

Sjoerd C. de Vries