Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the index of a list items in another list?

Tags:

python

list

numpy

Consider I have these lists:

l = [5,6,7,8,9,10,5,15,20]
m = [10,5]

I want to get the index of m in l. I used list comprehension to do that:

[(i,i+1) for i,j in enumerate(l) if m[0] == l[i] and m[1] == l[i+1]]

Output : [(5,6)]

But if I have more numbers in m, I feel its not the right way. So is there any easy approach in Python or with NumPy?

Another example:

l = [5,6,7,8,9,10,5,15,20,50,16,18]
m = [10,5,15,20]

The output should be:

[(5,6,7,8)]
like image 235
Bharath Avatar asked Aug 22 '17 11:08

Bharath


People also ask

How do you get the index of an element in a list in a list?

To find the index of an element in a list, you use the index() function. It returns 3 as expected. However, if you attempt to find an element that doesn't exist in the list using the index() function, you'll get an error. To fix this issue, you need to use the in operator.

How do you find the index of a list in a list Python?

To facilitate this, Python has an inbuilt function called index(). This function takes in the element as an argument and returns the index. By using this function we are able to find the index of an element in a list in Python.

How do you get the index of an element in a list in Java?

indexOf() in Java. The indexOf() method of ArrayList returns the index of the first occurrence of the specified element in this list, or -1 if this list does not contain the element. Syntax : public int IndexOf(Object o) obj : The element to search for.


2 Answers

The easiest way (using pure Python) would be to iterate over the items and first only check if the first item matches. This avoids doing sublist comparisons when not needed. Depending on the contents of your l this could outperform even NumPy broadcasting solutions:

def func(haystack, needle):  # obviously needs a better name ...
    if not needle:
        return
    # just optimization
    lengthneedle = len(needle)
    firstneedle = needle[0]
    for idx, item in enumerate(haystack):
        if item == firstneedle:
            if haystack[idx:idx+lengthneedle] == needle:
                yield tuple(range(idx, idx+lengthneedle))

>>> list(func(l, m))
[(5, 6, 7, 8)]

In case your interested in speed I checked the performance of the approaches (borrowing from my setup here):

import random
import numpy as np

# strided_app is from https://stackoverflow.com/a/40085052/
def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def pattern_index_broadcasting(all_data, search_data):
    n = len(search_data)
    all_data = np.asarray(all_data)
    all_data_2D = strided_app(np.asarray(all_data), n, S=1)
    return np.flatnonzero((all_data_2D == search_data).all(1))

# view1D is from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def pattern_index_view1D(all_data, search_data):
    a = strided_app(np.asarray(all_data), L=len(search_data), S=1)
    a0v, b0v = view1D(np.asarray(a), np.asarray(search_data))
    return np.flatnonzero(np.in1d(a0v, b0v))

def find_sublist_indices(haystack, needle):
    if not needle:
        return
    # just optimization
    lengthneedle = len(needle)
    firstneedle = needle[0]
    restneedle = needle[1:]
    for idx, item in enumerate(haystack):
        if item == firstneedle:
            if haystack[idx+1:idx+lengthneedle] == restneedle:
                yield tuple(range(idx, idx+lengthneedle))

def Divakar1(l, m):
    return np.squeeze(pattern_index_broadcasting(l, m)[:,None] + np.arange(len(m)))

def Divakar2(l, m):
    return np.squeeze(pattern_index_view1D(l, m)[:,None] + np.arange(len(m)))

def MSeifert(l, m):
    return list(find_sublist_indices(l, m))

# Timing setup
timings = {Divakar1: [], Divakar2: [], MSeifert: []}
sizes = [2**i for i in range(5, 20, 2)]

# Timing
for size in sizes:
    l = [random.randint(0, 50) for _ in range(size)]
    m = [random.randint(0, 50) for _ in range(10)]
    larr = np.asarray(l)
    marr = np.asarray(m)
    for func in timings:
        # first timings:
        # res = %timeit -o func(l, m)
        # second timings:
        if func is MSeifert:
            res = %timeit -o func(l, m)   
        else:
            res = %timeit -o func(larr, marr) 
        timings[func].append(res)

%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(1)
ax = plt.subplot(111)

for func in timings:
    ax.plot(sizes, 
            [time.best for time in timings[func]], 
            label=str(func.__name__))
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()

In case your l and m are lists my function outperforms the NumPy solutions for all sizes:

enter image description here

But in case you have these as numpy arrays you'll get faster results for large arrays (size > 1000 elements) when using Divakars NumPy solutions:

enter image description here

like image 59
MSeifert Avatar answered Sep 21 '22 06:09

MSeifert


Just making the point that @MSeifert's approach can, of course, also be implemented in numpy:

def pp(h,n):
    nn = len(n)
    NN = len(h)
    c = (h[:NN-nn+1]==n[0]).nonzero()[0]
    if c.size==0: return
    for i,l in enumerate(n[1:].tolist(),1):
        c = c[h[i:][c]==l]
        if c.size==0: return
    return np.arange(c[0],c[0]+nn)

enter image description here

like image 23
Paul Panzer Avatar answered Sep 19 '22 06:09

Paul Panzer