I want to check if a sublist is present in another (larger) list, in the exact same order of elements. I also want it to allow wildcards. For example I have the following lists:
>>> my_lists
[[0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 2, 2],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1],
[0, 1, 1, 1, 1, 2, 2, 2, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 2, 1, 1, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
And the sublist: [0, 0, 0, 1]
. If I want to find which lists contain this exact sublist I can do (taken from here):
def my_func(_list, sub_list):
n = len(sub_list)
return any((sub_list== _list[i:i+n]) for i in range(len(_list)-n+1))
for l in my_lists:
if my_func(l, [0, 0, 0, 1]):
print(l)
... which basically makes all possible sublists of the same length as the sub_list
, and checks whether or not any are equal. And I would get the following output since these lists contain [0, 0, 0, 1]
:
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
[0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Now I also want to add wildcards, meaning that I can give the sublist wildcard elements. For example, now I want to find the sublist [*, *, 0, 0, 0, 1, *]
. The asterisks here mean that for those elements, the value could be anything in the list. But for those asterisks there must be a value. The sublist [*, *, 0, 0, 0, 1, *]
would now output:
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
Note that now [0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
is not included since this list doesn't have two values before the [0, 0, 0, 1]
sequence starts. The same goes for [0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1]
, which also doesn't have two values before the sequence. Note that the asterisk could be anything such as np.nan.
How would I extend above code to allow for the wildcards?
issubset() function. The most used and recommended method to check for a sublist. This function is tailor made to perform the particular task of checking if one list is a subset of another.
The difference between a sentence and a list is that the elements of a sentence must be words, whereas the elements of a list can be anything at all: words, #t , procedures, or other lists. (A list that's an element of another list is called a sublist.
To get the subarray we can use slicing to get the subarray. Step 1: Run a loop till length+1 of the given list. Step 2: Run another loop from 0 to i. Step 3: Slice the subarray from j to i.
The original list : [5, 6, 3, 8, 2, 1, 7, 1] Is sublist present in list ? : True The combination of above functions is used to solve this problem. In this, we perform the task of checking for any sublist equating to desired using any () and list slicing is used to slice incremental list of desired length.
Print out the final list of sublists. We will use one nested for loop to find out all combinations of a list. Run one loop in the range of 0 to length of the list. Run one inner loop in the range of current outer loop to length of the list. Get the slice of the list in between the current indices pointed by the outer loop and inner loop.
Below is the complete python program: given_list is the original list entered by the user. result_list is the final list i.e. lists of list. size holds the size of the list. We are reading this value from the user.
List in Java contains index-based methods. This enables us to maintain the order collection. So this enables us to search, insert, delete, and even update the elements. This enables us to store multiple copies of the same element already existing on our list. Also, in addition, null elements are allowed to be a part of the List.
One way is to use all
when checking against sublists and an if
that skips asterisks:
def my_func(a_list, sub_list):
n = len(sub_list)
# list-level comparison is now via element-wise
return any(all(sub_item == chunk_item
for sub_item, chunk_item in zip(sub_list, a_list[i:i+n])
if not np.isnan(sub_item)) # "is_not_asterisk" condition
for i in range(len(a_list)-n+1))
where I used not np.isnan(...)
as the asterisk condition as mentioned in the question; but it could be many things: e.g., if asterisk is literally "*"
in sublists, then the condition there is changed to if sub_item != "*"
.
sample with np.nan
as asterisk:
for a_list in my_lists:
if my_func(a_list, [np.nan, np.nan, 0, 0, 0, 1, np.nan]):
print(a_list)
gives
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
all
returns True
if the iterable is empty so if an all-asterisk sub-list is passed, it will return True
for all candidates (as long as their length permits, which any
will handle because any
with empty iterable is False
!).
Another solution, with custom compare function:
def custom_cmp(l1, l2):
if len(l1) != len(l2):
return False
for a, b in zip(l1, l2):
if a == "*": # you can check here for b=='*' if you wish
continue
if a != b:
return False
return True
def my_func(_list, sub_list):
n = len(sub_list)
return any(
custom_cmp(sub_list, _list[i : i + n])
for i in range(len(_list) - n + 1)
)
for l in my_lists:
if my_func(l, ["*", "*", 0, 0, 0, 1, "*"]):
print(l)
Prints:
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
If we create a SuperInt
class that allows us to wrap int
but make it equal to another instance with a same value (or a 'normal' int
with the same value) and the string '*'
, we can use the same code you already have.
WILDCARD = '*'
class SuperInt(int):
def __eq__(self, other):
if not isinstance(other, self.__class__) and other == WILDCARD:
# or
# if isinstance(other, str) and other == '*': but there might be a caveat with that
return True
return super().__eq__(other)
Converting your my_lists
to use SuperInt
instances:
for i, li in enumerate(my_lists):
my_lists[i] = list(map(SuperInt, li))
running the exact same code you already have (just replacing *
with WILDCARD
as defined above):
def my_func(_list, sub_list):
n = len(sub_list)
return any((sub_list == _list[i:i+n]) for i in range(len(_list)-n+1))
for l in my_lists:
if my_func(l, [WILDCARD, WILDCARD, 0, 0, 0, 1, WILDCARD]):
print(l)
Outputs
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With