In preparing a response to An unexpected behavior of PatternTest in Mathematica I came across an unexpected Mathematica behavior of my own.
Please consider:
test = (Print[##]; False) &;
MatchQ[{1, 2, 3, 4, 5}, {x__?test, y__}]
During evaluation of In[1]:= 1 During evaluation of In[1]:= 1 During evaluation of In[1]:= 1 During evaluation of In[1]:= 1
False
Since, as Simon's quote of the documentation concisely states:
In a form such as
__?test
every element in the sequence matched by__
must yieldTrue
when test is applied.
I am wondering why Mathematica is testing the first element of the list four separate times. Of course there are four ways to make up the underlying pattern {x__, y__}
and if this were a Condition
test then all elements that make up the sequence x
would need to be tested, but I don't think that is the case here.
Does the logic not hold that if the first element of the list fails PatternTest
then given pattern cannot match?
Borrowing an example from yoda's answer, here is another example of what appears to be excess evaluation:
In[1]:= test2 = (Print@##; Positive@##) &;
MatchQ[{1, 2, 3, 4, -5}, {x__?test2, y__?Negative}]
During evaluation of In[1]:= 1
During evaluation of In[1]:= 1
During evaluation of In[1]:= 2
During evaluation of In[1]:= 1
During evaluation of In[1]:= 2
During evaluation of In[1]:= 3
During evaluation of In[1]:= 1
During evaluation of In[1]:= 2
During evaluation of In[1]:= 3
During evaluation of In[1]:= 4
Out[2]= True
I admit to having never explored this aspect of pattern matching before, and I am disturbed by the seeming inefficiency of this. Is this really as bad as it looks, or is some kind of automatic caching taking place? Print
appears to indicate against that.
Is there a more efficient pattern based way to write this?
Is this level of redundancy required for correct pattern matching behavior, and why?
I made a false assertion in haste, but I leave it because good answers below address it. Please ignore it in future answers.
It is easy to demonstrate that a similar optimization is made in other cases:
MatchQ[{-1, 2, 3, 4, 5}, {__?Positive, y__?test}]
(Nothing is printed.)
FalseHere Mathematica correctly never even tests any elements for
y
.
I think everyone is forgetting about the side effects that are possible in test functions. Here is what I think is happening: as Mr.Wizard and others mentioned, there are several ways the pattern may match, just combinatorially. For each combination of {x}
and {y}
, the x
pattern is first tested. By the way, there is no point in defining functions of multiple arguments (##
), since, as @Simon explained, the test function is applied separately to each element in the sequence. And this also explains why only the first element (-1) is printed: as soon as the first non-matching element is found, the pattern-matcher stops and goes on to test the next available combination.
Here is a more illustrative example:
In[20]:=
MatchQ[{-1,2,3,4,5},{_,x__?(Function[Print[#];Positive[#]])}]
During evaluation of In[20]:= 2
During evaluation of In[20]:= 3
During evaluation of In[20]:= 4
During evaluation of In[20]:= 5
Out[20]= True
Now it prints all of them, since the function is applied to them one by one, as it has to in this case.
Now to the crux of the matter. Here I set up a test function with a side effect, which decides to change its mind after it tests the very first element:
Module[{flag = False},
ClearAll[test3];
test3[x_] :=
With[{fl = flag},
If[! flag, flag = True];
Print[x];
fl
]
];
The first combination (which is {-1},{2,3,4,5}
will be rejected since the function gives False
at first. The second one ({-1,2},{3,4,5}
) will be accepted however. And this is exactly what we observe:
In[22]:=
MatchQ[{-1,2,3,4,5},{x__?test3,y__}]
During evaluation of In[22]:= -1
During evaluation of In[22]:= -1
During evaluation of In[22]:= 2
Out[22]= True
The printing stopped as soon as the pattern-matcher found a match.
Now, from here it must be obvious that the optimization mentioned in the question and some other answers is not generally possible, since pattern-matcher has no control over the mutable state possibly present in the testing functions.
When thinking about the pattern-matching, we usually view it as a process separate from evaluation, which is largely true, since the pattern-matcher is a built-in component of the system which, once patterns and expressions evaluate, takes over and largely bypasses the main evaluation loop. There are however notable exceptions, which make the pattern-matching more powerful at the price of getting it entangled with the evaluator. These include the uses of Condition
and PatternTest
, since these two are the "entry points" of the main evaluation process into the otherwise isolated from it pattern-matching process. Once the pattern-matcher hits one of these, it invokes the main evaluator on the condition to be tested, and anything is possible then. Which brings me once again to the observation that the pattern-matcher is most efficient when tests using PatternTest
and Condition
are absent, and the patterns are completely syntactic - in that case, it can optimize.
Just a quick guess here, but {__?Positive, y__?test}
must match the beginning of the list through to the end. So, {-1, 2, 3, 4, 5}
fails on the first element, hence no print out. Try replacing Positive
with ((Print[##];Positive[#])&)
, and you'll see it behaves in the same manner as the first pattern.
I think that your premise that Mathematica optimizes the pattern test is incorrect. Consider the pattern {x__, y__}
. As you rightly say, there are 4 ways in which {1,2,3,4,5}
can fit into this pattern. If you now add a pattern test to this with ?test
, MatchQ
should return True
if any of the 4 ways match the pattern. So it must necessarily test all possible options until a match has been found. So the optimization is only when there are positives, and not when the pattern fails.
Here's a modification of your example with Positive
, that shows you that Mathematica does not make the optimization as you claim:
test2 = (Print@##; Positive@##) &;
MatchQ[{-1, 2, 3, 4, 5}, {x__?test2, y__}]
It prints it 4 times! It terminates correctly with one print if the first element is indeed positive (thereby not having to test the rest). The test for the second element is applied only if the first is True
, which is why the test for y
was not applied in yours. Here's a different example:
test3 = (Print@##; Negative@##) &;
MatchQ[{1, 2, 3, 4, -5}, {x__?test2, y__?test3}]
I agree with your logic that if the first fails, then given that every element ought to match, all subsequent possibilities will also fail. My guess is that Mathematica is not that smart yet, and it is simpler to implement identical behaviour for any of _
, __
, or ___
, rather than a different one for each. Implementing a different behaviour depending on whether you use __
or ___
will mask the true nature of the test and make debugging harder.
ok I'll have a go.
MatchQ[{1, 2, 3, 4, 5}, {x__?test, y__}]//Trace
shows that slot sequence is redundant because only one element, in this case the first one in the sequence, is tested. e.g.
MatchQ[{1, 2, 3, 4, 5}, {_, x__?test, y__}] // Trace
Now prints 3 twos. If you change the first pattern to BlankSequence more permutations are generated and some of these have different first elements
MatchQ[{1, 2, 3, 4, 5}, {__, x__?test, y__}] // Trace
In your final example
MatchQ[{-1, 2, 3, 4, 5}, {__?Positive, y__?test}]
the first element always fails the test. If you change this to
MatchQ[{1, 2, -3, 4, 5}, {__?Positive, y__?test}]
you will see something more along the lines of what you expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With