Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding sublist in lists with map/select in mathematica

I have, in Wolfram Mathematica 8.0, a nested list like

nList = {{a,b},{f,g},{n,o}}

and a normal list like

lList = {a,b,c,k,m,n,o,z}

and i want to check if all the sublists in nList are in lList (in the example a,b and n,o are there but not f,g)

I've done it using For[,,,] and using index... can someone enlighten me in using functions like Map/Thread/Select to do it in one pass.

Edit: If nList contains a,b, lList must contain a,b and not a,c,b or b,a or b,c,a

like image 770
Ferlo Avatar asked Jun 01 '11 13:06

Ferlo


1 Answers

Assuming that you don't care about element ordering, here is one way:

In[20]:= Complement[Flatten[nList],lList] ==={}

Out[20]= False

EDIT

If the order matters, then here is one way:

In[29]:= And@@(MatchQ[lList,{___,PatternSequence[##],___}]&@@@nList)

Out[29]= False

For large number of sub-lists, this may be faster:

In[34]:= 
Union[ReplaceList[lList,
       {___,x:Alternatives@@(PatternSequence@@@nList),___}:>{x}]]===Union[nList]

Out[34]= False

This works as follows: ReplaceList is a very nice but often ignored command which returns a list of all possible expressions which could be obtained with the pattern-matcher trying to apply the rules in all possible ways to an expression. This is in contrast with the way the pattern-matcher usually works, where it stops upon the first successful match. The PatternSequence is a relatively new addition to the Mathematica pattern language, which allows us to give an identity to a given sequence of expression, treating it as a pattern. This allowed us to construct the alternative pattern, so the resulting pattern is saying: the sequence of any sublist in any place in the main list is collected and put back to list braces, forming back the sublist. We get as many newly formed sublists as there are sequences of the original sublists in the larger list. If all sublists are present, then Union on the resulting list should be the same as Union of the original sublist list.

Here are the benchmarks (I took a list of integers, and overlapping sublists generated by Partition):

In[39]:= tst = Range[1000];

In[41]:= sub = Partition[tst, 2, 1];

In[43]:= 
And @@ (MatchQ[tst, {___, PatternSequence[##], ___}] & @@@ sub) // Timing

Out[43]= {3.094, True}

In[45]:= 
Union[ReplaceList[tst, {___,x : Alternatives @@ (PatternSequence @@@ sub), ___} 
     :> {x}]] ===  Union[sub] // Timing

Out[45]= {0.11, True}

Conceptually, the reason why the second method is faster is that it does its work in the single run through the list (performed internally by ReplaceList), while the first solution explicitly iterates through the big list for each sub-list.

EDIT 2 - Performance

If performance is really an issue, then the following code is yet much faster:

And @@ (With[{part = Partition[lList, Length[#[[1]]], 1]},
 Complement[#, part] === {}] & /@SplitBy[SortBy[nList, Length], Length])

For example, on our benchmarks:

In[54]:= And@@(With[{part = Partition[tst,Length[#[[1]]],1]},
       Complement[#,part]==={}]&/@SplitBy[SortBy[sub,Length],Length])//Timing

Out[54]= {0.,True}

EDIT 3

Per suggestion of @Mr.Wizard, the following performance improvement can be made:

Scan[
 If[With[{part = Partition[lList, Length[#[[1]]], 1]},
   Complement[#, part] =!= {}], Return[False]] &,
 SplitBy[SortBy[nList, Length], Length]
] === Null

Here, the as soon as we get a negative answer from sub-lists of a given length, sublists of other lengths will not be checked, since we already know that the answer is negative (False). If Scan completes without Return, it will return Null, which will mean that lList contains all of the sublists in nList.

like image 198
Leonid Shifrin Avatar answered Sep 22 '22 06:09

Leonid Shifrin