Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Cases[] so slow here? Are there any tricks to speed it up?

While trying to paste images, I noticed that Cases[] is very slow.

To reproduce, first copy a large image to the clipboard (just press Print Screen), then evaluate the following:

In[33]:= SetSystemOptions["PackedArrayOptions" -> "UnpackMessage" -> True];

In[34]:= AbsoluteTiming[nb = NotebookGet@ClipboardNotebook[];]
Out[34]= {0.4687500, Null}

In[35]:= AbsoluteTiming[d1 = nb[[1, 1, 1, 1, 1, 1, 1]];]
Out[35]= {0., Null}

In[36]:= AbsoluteTiming[d2 = First@Cases[nb, r_RasterBox :> First[r], Infinity, 1];]

During evaluation of In[36]:= Developer`FromPackedArray::unpack: Unpacking array in call to Notebook. >>

Out[36]= {0.9375000, Null}

(I did this on Windows, not sure if the paste code is the same on other systems.)

Note that extracting the data using Cases is extremely slow compared to using Part directly, even though I explicitly tell Cases that I need only one match.

I did find out (as shown above) that Cases triggers unpacking for some reason, even though the search should stop before it reaches the packed array inside. Using a shallower level specification than Infinity might avoid unpacking.

Question: Using Cases here is both easier and more reliable than Part (what if the subexpression can appear in different positions?) Is there a way to make Cases fast here, perhaps by using a different pattern or different options?


Possibly related question: Mathematica's pattern matching poorly optimized? (This is why I changed the Cases rule from RasterBox[data_, ___] -> data to r_RasterBox :> First[r].)

like image 730
Szabolcs Avatar asked Jan 02 '12 12:01

Szabolcs


1 Answers

I don't have access to Mathematica right now, so what follows is untested. My guess is that Cases unpacks here because it searches depth-first, and so sees the packed array first. If this is correct, then you could use rules instead (ReplaceAll, not Replace), and throw an exception upon first match:

Module[{tag},
   Catch[
     nb /. r_RasterBox :> Block[{}, Throw[First[r], tag] /; True]; 
     $Failed, 
     tag]
]

As I said, this is just an untested guess.

Edit 2: an approach based on shielding parts of expression from the pattern-matcher

Preamble

In the first edit (below) a rather heavy approach is presented. In many cases, one can take an alternative route. In this particular problem (and many others like it), the main problem is to somehow shield certain sub-expressions from the pattern-matcher. This can be achieved also by using rules, to temporarily replace the parts of interest by some dummy symbols.

Code

Here is a modification of Cases which does just that:

Clear[casesShielded];
casesShielded[expr_,pt_,shieldPattern_,levspec_,n_,opts:OptionsPattern[]]:=
   Module[{dummy,inverseShieldingRules, shielded, i=0},
      inverseShieldingRules =
        If[#==={},#,Dispatch@First@#]&@
           Reap[shielded= expr/.(p:shieldPattern):>
             With[{eval = With[{ind = ++i},Sow[dummy[ind]:>p];dummy[ind]]},
                eval/;True];
           ][[2]];
      Cases[shielded,pt,levspec,n,opts]/.inverseShieldingRules]; 

This version of Cases has one additional parameter shieldPattern (third one), which indicates which sub-expressions must be shielded from the pattern-matcher.

Advantages and applicability

The code above is pretty light-weight (compared to the suggestion of edit1 below), and it allows one to fully reuse and leverage the existing Cases functionality. This will work for cases when the main pattern (or rule) is insensitive to shielding of the relevant parts, which is a rather common situation (and in particular, covers patterns of the type _h, including the case at hand). This may also be faster than the application of myCases (described below).

The case at hand

Here, we need this call:

In[55]:=    
(d4=First@casesShielded[nb,x_RasterBox:>First@x,
    p_List/;Developer`PackedArrayQ[p],Infinity,1]);//Timing

Out[55]= {0.,Null}

and the result is of course the same as before:

In[61]:= d2===d4
Out[61]= True

Edit: an alternative Cases-like function

Motivation and code

It took me a while to produce this function, and I am not 100 percent sure it always works correctly, but here is a version of Cases which, while still working depth-first, analyzes expression as a whole before sub-expressions:

ClearAll[myCases];
myCases[expr_, lhs_ :> rhs_, upToLevel_: 1, max : (_Integer | All) : All, 
    opts : OptionsPattern[]] :=
 Module[{tag, result, f, found = 0, aux},
   With[{
    mopts = FilterRules[{opts}, {Heads -> False}],
    frule =
       Apply[
         RuleDelayed,
         Hold[lhs, With[{eval =  aux}, Null /; True]] /.
            {aux :> Sow[rhs, tag] /; max === All, 
             aux :> (found++; Sow[rhs, tag])}
       ]
    },
    SetAttributes[f, HoldAllComplete];
    If[max =!= All,
       _f /; found >= max := Throw[Null, tag]
    ];
    f[x_, n_] /; n > upToLevel := Null;
    f[x_, n_] :=
      Replace[
       HoldComplete[x],
       {
          frule,
          ex : _[___] :> 
            With[{ev = 
              Replace[
                HoldComplete[ex],
                y_ :> With[{eval = f[y, n + 1]}, Null /; True],
                {2},
                Sequence @@ mopts
              ]}, 
              Null /; True
            ]
       },
       {1}
      ]
   ]; (* external With *)
   result = 
     If[# === {}, #, First@#] &@
        Reap[Catch[f[expr, 0], tag], tag, #2 &][[2]];
   (* For proper garbage-collection of f *)
   ClearAll[f]; 
   result
 ]

How it works

This is not the most trivial piece of code, so here are some remarks. This version of Cases is based on the same idea I suggested first - namely, use rule-substitution semantics to first attempt the pattern-match on an entire expression and only if that fails, go to sub-expressions. I stress that this is still the depth-first traversal, but different from the standard one (which is used in most expression-traversing functions like Map, Scan, Cases, etc). I use Reap and Sow to collect the intermediate results (matches). The trickiest part here is to prevent sub-expressions from evaluation, and I had to wrap sub-expressions into HoldComplete. Consequently, I had to use (a nested version of the) Trott-Strzebonski technique (perhaps, there are simpler ways, but I wasn't able to see them), to enable evauation of rules' r.h.sides inside held (sub)expressions, and used Replace with proper level spec, accounting for extra added HoldComplete wrappers. I return Null in rules, since the main action is to Sow the parts, so it does not matter what is injected into the original expression at the end. Some extra complexity was added by the code to support the level specification (I only support the single integer level indicating the maximal level up to which to search, not the full range of possible lev.specs), the maximal number of found results, and the Heads option. The code for frule serves to not introduce the overhead of counting found elements in cases when we want to find all of them. I am using the same Module-generated tag both as a tag for Sow, and as a tag for exceptions (which I use to stop the process when enough matches have been found, just like in my original suggestion).

Tests and benchmarks

To have a non-trivial test of this functionality, we can for example find all symbols in the DownValues of myCases, and compare to Cases:

In[185]:= 
And@@Flatten[
    Outer[
       myCases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2]  ===
       Cases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2]&,
       Range[0,20],
       {True,False}
    ]]

Out[185]= True

The myCases function is about 20-30 times slower than Cases though:

In[186]:= 
Do[myCases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[186]= {3.188,Null}

In[187]:= Do[Cases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[187]= {0.125,Null}

The case at hand

It is easy to check that myCases solves the original problem of unpacking:

In[188]:= AbsoluteTiming[d3=First@myCases[nb,r_RasterBox:>First[r],Infinity,1];]
Out[188]= {0.0009766,Null}

In[189]:= d3===d2
Out[189]= True

It is hoped that myCases can be generally useful for situations like this, although the performance penalty of using it in place of Cases is substantial and has to be taken into account.

like image 163
Leonid Shifrin Avatar answered Nov 16 '22 03:11

Leonid Shifrin