Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PatternTest not optimized?

In preparing a response to An unexpected behavior of PatternTest in Mathematica I came across an unexpected Mathematica behavior of my own.

Please consider:

test = (Print[##]; False) &;
MatchQ[{1, 2, 3, 4, 5}, {x__?test, y__}]
During evaluation of In[1]:= 1

During evaluation of In[1]:= 1

During evaluation of In[1]:= 1

During evaluation of In[1]:= 1
False

Since, as Simon's quote of the documentation concisely states:

In a form such as __?test every element in the sequence matched by __ must yield True when test is applied.

I am wondering why Mathematica is testing the first element of the list four separate times. Of course there are four ways to make up the underlying pattern {x__, y__} and if this were a Condition test then all elements that make up the sequence x would need to be tested, but I don't think that is the case here.

Does the logic not hold that if the first element of the list fails PatternTest then given pattern cannot match?

If it does hold, why is Mathematica not making this simple optimization?


Borrowing an example from yoda's answer, here is another example of what appears to be excess evaluation:

In[1]:= test2 = (Print@##; Positive@##) &;
MatchQ[{1, 2, 3, 4, -5}, {x__?test2, y__?Negative}]

During evaluation of In[1]:= 1

During evaluation of In[1]:= 1

During evaluation of In[1]:= 2

During evaluation of In[1]:= 1

During evaluation of In[1]:= 2

During evaluation of In[1]:= 3

During evaluation of In[1]:= 1

During evaluation of In[1]:= 2

During evaluation of In[1]:= 3

During evaluation of In[1]:= 4

Out[2]= True

I admit to having never explored this aspect of pattern matching before, and I am disturbed by the seeming inefficiency of this. Is this really as bad as it looks, or is some kind of automatic caching taking place? Print appears to indicate against that.

  • Is there a more efficient pattern based way to write this?

  • Is this level of redundancy required for correct pattern matching behavior, and why?


I made a false assertion in haste, but I leave it because good answers below address it. Please ignore it in future answers.

It is easy to demonstrate that a similar optimization is made in other cases:

MatchQ[{-1, 2, 3, 4, 5}, {__?Positive, y__?test}]

(Nothing is printed.)

False

Here Mathematica correctly never even tests any elements for y.

like image 278
Mr.Wizard Avatar asked Dec 13 '11 04:12

Mr.Wizard


4 Answers

I think everyone is forgetting about the side effects that are possible in test functions. Here is what I think is happening: as Mr.Wizard and others mentioned, there are several ways the pattern may match, just combinatorially. For each combination of {x} and {y}, the x pattern is first tested. By the way, there is no point in defining functions of multiple arguments (##), since, as @Simon explained, the test function is applied separately to each element in the sequence. And this also explains why only the first element (-1) is printed: as soon as the first non-matching element is found, the pattern-matcher stops and goes on to test the next available combination.

Here is a more illustrative example:

In[20]:= 
MatchQ[{-1,2,3,4,5},{_,x__?(Function[Print[#];Positive[#]])}]
During evaluation of In[20]:= 2
During evaluation of In[20]:= 3
During evaluation of In[20]:= 4
During evaluation of In[20]:= 5

Out[20]= True

Now it prints all of them, since the function is applied to them one by one, as it has to in this case.

Now to the crux of the matter. Here I set up a test function with a side effect, which decides to change its mind after it tests the very first element:

Module[{flag = False}, 
  ClearAll[test3]; 
  test3[x_] := 
     With[{fl = flag}, 
        If[! flag, flag = True]; 
        Print[x]; 
        fl
     ]
];

The first combination (which is {-1},{2,3,4,5} will be rejected since the function gives False at first. The second one ({-1,2},{3,4,5}) will be accepted however. And this is exactly what we observe:

In[22]:= 
MatchQ[{-1,2,3,4,5},{x__?test3,y__}]
During evaluation of In[22]:= -1
During evaluation of In[22]:= -1
During evaluation of In[22]:= 2

Out[22]= True

The printing stopped as soon as the pattern-matcher found a match.

Now, from here it must be obvious that the optimization mentioned in the question and some other answers is not generally possible, since pattern-matcher has no control over the mutable state possibly present in the testing functions.

When thinking about the pattern-matching, we usually view it as a process separate from evaluation, which is largely true, since the pattern-matcher is a built-in component of the system which, once patterns and expressions evaluate, takes over and largely bypasses the main evaluation loop. There are however notable exceptions, which make the pattern-matching more powerful at the price of getting it entangled with the evaluator. These include the uses of Condition and PatternTest, since these two are the "entry points" of the main evaluation process into the otherwise isolated from it pattern-matching process. Once the pattern-matcher hits one of these, it invokes the main evaluator on the condition to be tested, and anything is possible then. Which brings me once again to the observation that the pattern-matcher is most efficient when tests using PatternTest and Condition are absent, and the patterns are completely syntactic - in that case, it can optimize.

like image 75
Leonid Shifrin Avatar answered Mar 02 '23 14:03

Leonid Shifrin


Just a quick guess here, but {__?Positive, y__?test} must match the beginning of the list through to the end. So, {-1, 2, 3, 4, 5} fails on the first element, hence no print out. Try replacing Positive with ((Print[##];Positive[#])&), and you'll see it behaves in the same manner as the first pattern.

like image 41
rcollyer Avatar answered Mar 02 '23 15:03

rcollyer


I think that your premise that Mathematica optimizes the pattern test is incorrect. Consider the pattern {x__, y__}. As you rightly say, there are 4 ways in which {1,2,3,4,5} can fit into this pattern. If you now add a pattern test to this with ?test, MatchQ should return True if any of the 4 ways match the pattern. So it must necessarily test all possible options until a match has been found. So the optimization is only when there are positives, and not when the pattern fails.

Here's a modification of your example with Positive, that shows you that Mathematica does not make the optimization as you claim:

test2 = (Print@##; Positive@##) &;
MatchQ[{-1, 2, 3, 4, 5}, {x__?test2, y__}]

It prints it 4 times! It terminates correctly with one print if the first element is indeed positive (thereby not having to test the rest). The test for the second element is applied only if the first is True, which is why the test for y was not applied in yours. Here's a different example:

test3 = (Print@##; Negative@##) &; 
MatchQ[{1, 2, 3, 4, -5}, {x__?test2, y__?test3}]

I agree with your logic that if the first fails, then given that every element ought to match, all subsequent possibilities will also fail. My guess is that Mathematica is not that smart yet, and it is simpler to implement identical behaviour for any of _, __, or ___, rather than a different one for each. Implementing a different behaviour depending on whether you use __ or ___ will mask the true nature of the test and make debugging harder.

like image 42
abcd Avatar answered Mar 02 '23 14:03

abcd


ok I'll have a go.

MatchQ[{1, 2, 3, 4, 5}, {x__?test, y__}]//Trace

shows that slot sequence is redundant because only one element, in this case the first one in the sequence, is tested. e.g.

MatchQ[{1, 2, 3, 4, 5}, {_, x__?test, y__}] // Trace

Now prints 3 twos. If you change the first pattern to BlankSequence more permutations are generated and some of these have different first elements

MatchQ[{1, 2, 3, 4, 5}, {__, x__?test, y__}] // Trace

In your final example

MatchQ[{-1, 2, 3, 4, 5}, {__?Positive, y__?test}]

the first element always fails the test. If you change this to

MatchQ[{1, 2, -3, 4, 5}, {__?Positive, y__?test}]

you will see something more along the lines of what you expected.

like image 43
Mike Honeychurch Avatar answered Mar 02 '23 15:03

Mike Honeychurch