Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check for presence of a sublist in list with order kept, and allow wildcards

Tags:

python

list

I want to check if a sublist is present in another (larger) list, in the exact same order of elements. I also want it to allow wildcards. For example I have the following lists:

>>> my_lists
[[0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 2, 2],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1],
[0, 1, 1, 1, 1, 2, 2, 2, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 2, 1, 1, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

And the sublist: [0, 0, 0, 1]. If I want to find which lists contain this exact sublist I can do (taken from here):

def my_func(_list, sub_list):
    n = len(sub_list)
    return any((sub_list== _list[i:i+n]) for i in range(len(_list)-n+1))

for l in my_lists:
    if my_func(l, [0, 0, 0, 1]):
        print(l)

... which basically makes all possible sublists of the same length as the sub_list, and checks whether or not any are equal. And I would get the following output since these lists contain [0, 0, 0, 1]:

[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
[0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Now I also want to add wildcards, meaning that I can give the sublist wildcard elements. For example, now I want to find the sublist [*, *, 0, 0, 0, 1, *]. The asterisks here mean that for those elements, the value could be anything in the list. But for those asterisks there must be a value. The sublist [*, *, 0, 0, 0, 1, *] would now output:

[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]

Note that now [0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] is not included since this list doesn't have two values before the [0, 0, 0, 1] sequence starts. The same goes for [0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1], which also doesn't have two values before the sequence. Note that the asterisk could be anything such as np.nan.

How would I extend above code to allow for the wildcards?

like image 908
sandertjuh Avatar asked Jun 22 '21 20:06

sandertjuh


People also ask

How do you check if a list contains a sublist?

issubset() function. The most used and recommended method to check for a sublist. This function is tailor made to perform the particular task of checking if one list is a subset of another.

What is the difference between a list and a sublist?

The difference between a sentence and a list is that the elements of a sentence must be words, whereas the elements of a list can be anything at all: words, #t , procedures, or other lists. (A list that's an element of another list is called a sublist.

How do you take a sublist from a list in Python?

To get the subarray we can use slicing to get the subarray. Step 1: Run a loop till length+1 of the given list. Step 2: Run another loop from 0 to i. Step 3: Slice the subarray from j to i.

Is sublist present in list?

The original list : [5, 6, 3, 8, 2, 1, 7, 1] Is sublist present in list ? : True The combination of above functions is used to solve this problem. In this, we perform the task of checking for any sublist equating to desired using any () and list slicing is used to slice incremental list of desired length.

How to find all combinations of a list of sublists?

Print out the final list of sublists. We will use one nested for loop to find out all combinations of a list. Run one loop in the range of 0 to length of the list. Run one inner loop in the range of current outer loop to length of the list. Get the slice of the list in between the current indices pointed by the outer loop and inner loop.

What is given_list and result_list in Python?

Below is the complete python program: given_list is the original list entered by the user. result_list is the final list i.e. lists of list. size holds the size of the list. We are reading this value from the user.

What is list in Java and how to use list?

List in Java contains index-based methods. This enables us to maintain the order collection. So this enables us to search, insert, delete, and even update the elements. This enables us to store multiple copies of the same element already existing on our list. Also, in addition, null elements are allowed to be a part of the List.


3 Answers

One way is to use all when checking against sublists and an if that skips asterisks:

def my_func(a_list, sub_list):
    n = len(sub_list)

    # list-level comparison is now via element-wise
    return any(all(sub_item == chunk_item
                   for sub_item, chunk_item in zip(sub_list, a_list[i:i+n])
                   if not np.isnan(sub_item))  # "is_not_asterisk" condition
               for i in range(len(a_list)-n+1))

where I used not np.isnan(...) as the asterisk condition as mentioned in the question; but it could be many things: e.g., if asterisk is literally "*" in sublists, then the condition there is changed to if sub_item != "*".

sample with np.nan as asterisk:

for a_list in my_lists:
    if my_func(a_list, [np.nan, np.nan, 0, 0, 0, 1, np.nan]):
        print(a_list)

gives

[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]

all returns True if the iterable is empty so if an all-asterisk sub-list is passed, it will return True for all candidates (as long as their length permits, which any will handle because any with empty iterable is False!).

like image 161
Mustafa Aydın Avatar answered Oct 23 '22 03:10

Mustafa Aydın


Another solution, with custom compare function:

def custom_cmp(l1, l2):
    if len(l1) != len(l2):
        return False

    for a, b in zip(l1, l2):
        if a == "*":  # you can check here for b=='*' if you wish
            continue
        if a != b:
            return False

    return True


def my_func(_list, sub_list):
    n = len(sub_list)
    return any(
        custom_cmp(sub_list, _list[i : i + n])
        for i in range(len(_list) - n + 1)
    )


for l in my_lists:
    if my_func(l, ["*", "*", 0, 0, 0, 1, "*"]):
        print(l)

Prints:

[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
like image 31
Andrej Kesely Avatar answered Oct 23 '22 01:10

Andrej Kesely


If we create a SuperInt class that allows us to wrap int but make it equal to another instance with a same value (or a 'normal' int with the same value) and the string '*', we can use the same code you already have.

WILDCARD = '*'

class SuperInt(int):
    def __eq__(self, other):
        if not isinstance(other, self.__class__) and other == WILDCARD:
        # or
        # if isinstance(other, str) and other == '*': but there might be a caveat with that
            return True
        return super().__eq__(other)

Converting your my_lists to use SuperInt instances:

for i, li in enumerate(my_lists):
    my_lists[i] = list(map(SuperInt, li))

running the exact same code you already have (just replacing * with WILDCARD as defined above):

def my_func(_list, sub_list):
    n = len(sub_list)
    return any((sub_list == _list[i:i+n]) for i in range(len(_list)-n+1))

for l in my_lists:
    if my_func(l, [WILDCARD, WILDCARD, 0, 0, 0, 1, WILDCARD]):
        print(l)

Outputs

[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
like image 43
DeepSpace Avatar answered Oct 23 '22 02:10

DeepSpace