While trying to paste images, I noticed that Cases[]
is very slow.
To reproduce, first copy a large image to the clipboard (just press Print Screen), then evaluate the following:
In[33]:= SetSystemOptions["PackedArrayOptions" -> "UnpackMessage" -> True];
In[34]:= AbsoluteTiming[nb = NotebookGet@ClipboardNotebook[];]
Out[34]= {0.4687500, Null}
In[35]:= AbsoluteTiming[d1 = nb[[1, 1, 1, 1, 1, 1, 1]];]
Out[35]= {0., Null}
In[36]:= AbsoluteTiming[d2 = First@Cases[nb, r_RasterBox :> First[r], Infinity, 1];]
During evaluation of In[36]:= Developer`FromPackedArray::unpack: Unpacking array in call to Notebook. >>
Out[36]= {0.9375000, Null}
(I did this on Windows, not sure if the paste code is the same on other systems.)
Note that extracting the data using Cases
is extremely slow compared to using Part
directly, even though I explicitly tell Cases
that I need only one match.
I did find out (as shown above) that Cases
triggers unpacking for some reason, even though the search should stop before it reaches the packed array inside. Using a shallower level specification than Infinity
might avoid unpacking.
Question: Using Cases
here is both easier and more reliable than Part
(what if the subexpression can appear in different positions?) Is there a way to make Cases
fast here, perhaps by using a different pattern or different options?
Possibly related question: Mathematica's pattern matching poorly optimized?
(This is why I changed the Cases
rule from RasterBox[data_, ___] -> data
to r_RasterBox :> First[r]
.)
I don't have access to Mathematica right now, so what follows is untested. My guess is that Cases
unpacks here because it searches depth-first, and so sees the packed array first. If this is correct, then you could use rules instead (ReplaceAll
, not Replace
), and throw an exception upon first match:
Module[{tag},
Catch[
nb /. r_RasterBox :> Block[{}, Throw[First[r], tag] /; True];
$Failed,
tag]
]
As I said, this is just an untested guess.
In the first edit (below) a rather heavy approach is presented. In many cases, one can take an alternative route. In this particular problem (and many others like it), the main problem is to somehow shield certain sub-expressions from the pattern-matcher. This can be achieved also by using rules, to temporarily replace the parts of interest by some dummy symbols.
Here is a modification of Cases
which does just that:
Clear[casesShielded];
casesShielded[expr_,pt_,shieldPattern_,levspec_,n_,opts:OptionsPattern[]]:=
Module[{dummy,inverseShieldingRules, shielded, i=0},
inverseShieldingRules =
If[#==={},#,Dispatch@First@#]&@
Reap[shielded= expr/.(p:shieldPattern):>
With[{eval = With[{ind = ++i},Sow[dummy[ind]:>p];dummy[ind]]},
eval/;True];
][[2]];
Cases[shielded,pt,levspec,n,opts]/.inverseShieldingRules];
This version of Cases
has one additional parameter shieldPattern
(third one), which indicates which sub-expressions must be shielded from the pattern-matcher.
The code above is pretty light-weight (compared to the suggestion of edit1 below), and it allows one to fully reuse and leverage the existing Cases
functionality. This will work for cases when the main pattern (or rule) is insensitive to shielding of the relevant parts, which is a rather common situation (and in particular, covers patterns of the type _h
, including the case at hand). This may also be faster than the application of myCases
(described below).
Here, we need this call:
In[55]:=
(d4=First@casesShielded[nb,x_RasterBox:>First@x,
p_List/;Developer`PackedArrayQ[p],Infinity,1]);//Timing
Out[55]= {0.,Null}
and the result is of course the same as before:
In[61]:= d2===d4
Out[61]= True
It took me a while to produce this function, and I am not 100 percent sure it always works correctly, but here is a version of Cases
which, while still working depth-first, analyzes expression as a whole before sub-expressions:
ClearAll[myCases];
myCases[expr_, lhs_ :> rhs_, upToLevel_: 1, max : (_Integer | All) : All,
opts : OptionsPattern[]] :=
Module[{tag, result, f, found = 0, aux},
With[{
mopts = FilterRules[{opts}, {Heads -> False}],
frule =
Apply[
RuleDelayed,
Hold[lhs, With[{eval = aux}, Null /; True]] /.
{aux :> Sow[rhs, tag] /; max === All,
aux :> (found++; Sow[rhs, tag])}
]
},
SetAttributes[f, HoldAllComplete];
If[max =!= All,
_f /; found >= max := Throw[Null, tag]
];
f[x_, n_] /; n > upToLevel := Null;
f[x_, n_] :=
Replace[
HoldComplete[x],
{
frule,
ex : _[___] :>
With[{ev =
Replace[
HoldComplete[ex],
y_ :> With[{eval = f[y, n + 1]}, Null /; True],
{2},
Sequence @@ mopts
]},
Null /; True
]
},
{1}
]
]; (* external With *)
result =
If[# === {}, #, First@#] &@
Reap[Catch[f[expr, 0], tag], tag, #2 &][[2]];
(* For proper garbage-collection of f *)
ClearAll[f];
result
]
This is not the most trivial piece of code, so here are some remarks. This version of Cases
is based on the same idea I suggested first - namely, use rule-substitution semantics to first attempt the pattern-match on an entire expression and only if that fails, go to sub-expressions. I stress that this is still the depth-first traversal, but different from the standard one (which is used in most expression-traversing functions like Map
, Scan
, Cases
, etc). I use Reap
and Sow
to collect the intermediate results (matches). The trickiest part here is to prevent sub-expressions from evaluation, and I had to wrap sub-expressions into HoldComplete
. Consequently, I had to use (a nested version of the) Trott-Strzebonski technique (perhaps, there are simpler ways, but I wasn't able to see them), to enable evauation of rules' r.h.sides inside held (sub)expressions, and used Replace
with proper level spec, accounting for extra added HoldComplete
wrappers. I return Null
in rules, since the main action is to Sow
the parts, so it does not matter what is injected into the original expression at the end. Some extra complexity was added by the code to support the level specification (I only support the single integer level indicating the maximal level up to which to search, not the full range of possible lev.specs), the maximal number of found results, and the Heads
option. The code for frule
serves to not introduce the overhead of counting found elements in cases when we want to find all of them. I am using the same Module
-generated tag both as a tag for Sow
, and as a tag for exceptions (which I use to stop the process when enough matches have been found, just like in my original suggestion).
To have a non-trivial test of this functionality, we can for example find all symbols in the DownValues
of myCases
, and compare to Cases
:
In[185]:=
And@@Flatten[
Outer[
myCases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2] ===
Cases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2]&,
Range[0,20],
{True,False}
]]
Out[185]= True
The myCases
function is about 20-30 times slower than Cases
though:
In[186]:=
Do[myCases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[186]= {3.188,Null}
In[187]:= Do[Cases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[187]= {0.125,Null}
It is easy to check that myCases
solves the original problem of unpacking:
In[188]:= AbsoluteTiming[d3=First@myCases[nb,r_RasterBox:>First[r],Infinity,1];]
Out[188]= {0.0009766,Null}
In[189]:= d3===d2
Out[189]= True
It is hoped that myCases
can be generally useful for situations like this, although the performance penalty of using it in place of Cases
is substantial and has to be taken into account.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With