How to get the index of a list items in another list?

Tags:

Consider I have these lists:

l = [5,6,7,8,9,10,5,15,20]
m = [10,5]

I want to get the index of m in l. I used list comprehension to do that:

[(i,i+1) for i,j in enumerate(l) if m[0] == l[i] and m[1] == l[i+1]]

Output : [(5,6)]

But if I have more numbers in m, I feel its not the right way. So is there any easy approach in Python or with NumPy?

Another example:

l = [5,6,7,8,9,10,5,15,20,50,16,18]
m = [10,5,15,20]

The output should be:

[(5,6,7,8)]

235

asked Aug 22 '17 11:08

Bharath

2 Answers

The easiest way (using pure Python) would be to iterate over the items and first only check if the first item matches. This avoids doing sublist comparisons when not needed. Depending on the contents of your l this could outperform even NumPy broadcasting solutions:

def func(haystack, needle):  # obviously needs a better name ...
    if not needle:
        return
    # just optimization
    lengthneedle = len(needle)
    firstneedle = needle[0]
    for idx, item in enumerate(haystack):
        if item == firstneedle:
            if haystack[idx:idx+lengthneedle] == needle:
                yield tuple(range(idx, idx+lengthneedle))

>>> list(func(l, m))
[(5, 6, 7, 8)]

In case your interested in speed I checked the performance of the approaches (borrowing from my setup here):

import random
import numpy as np

# strided_app is from https://stackoverflow.com/a/40085052/
def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def pattern_index_broadcasting(all_data, search_data):
    n = len(search_data)
    all_data = np.asarray(all_data)
    all_data_2D = strided_app(np.asarray(all_data), n, S=1)
    return np.flatnonzero((all_data_2D == search_data).all(1))

# view1D is from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def pattern_index_view1D(all_data, search_data):
    a = strided_app(np.asarray(all_data), L=len(search_data), S=1)
    a0v, b0v = view1D(np.asarray(a), np.asarray(search_data))
    return np.flatnonzero(np.in1d(a0v, b0v))

def find_sublist_indices(haystack, needle):
    if not needle:
        return
    # just optimization
    lengthneedle = len(needle)
    firstneedle = needle[0]
    restneedle = needle[1:]
    for idx, item in enumerate(haystack):
        if item == firstneedle:
            if haystack[idx+1:idx+lengthneedle] == restneedle:
                yield tuple(range(idx, idx+lengthneedle))

def Divakar1(l, m):
    return np.squeeze(pattern_index_broadcasting(l, m)[:,None] + np.arange(len(m)))

def Divakar2(l, m):
    return np.squeeze(pattern_index_view1D(l, m)[:,None] + np.arange(len(m)))

def MSeifert(l, m):
    return list(find_sublist_indices(l, m))

# Timing setup
timings = {Divakar1: [], Divakar2: [], MSeifert: []}
sizes = [2**i for i in range(5, 20, 2)]

# Timing
for size in sizes:
    l = [random.randint(0, 50) for _ in range(size)]
    m = [random.randint(0, 50) for _ in range(10)]
    larr = np.asarray(l)
    marr = np.asarray(m)
    for func in timings:
        # first timings:
        # res = %timeit -o func(l, m)
        # second timings:
        if func is MSeifert:
            res = %timeit -o func(l, m)   
        else:
            res = %timeit -o func(larr, marr) 
        timings[func].append(res)

%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(1)
ax = plt.subplot(111)

for func in timings:
    ax.plot(sizes, 
            [time.best for time in timings[func]], 
            label=str(func.__name__))
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()

In case your l and m are lists my function outperforms the NumPy solutions for all sizes:

enter image description here

But in case you have these as numpy arrays you'll get faster results for large arrays (size > 1000 elements) when using Divakars NumPy solutions:

enter image description here

answered Sep 21 '22 06:09

MSeifert

Just making the point that @MSeifert's approach can, of course, also be implemented in numpy:

def pp(h,n):
    nn = len(n)
    NN = len(h)
    c = (h[:NN-nn+1]==n[0]).nonzero()[0]
    if c.size==0: return
    for i,l in enumerate(n[1:].tolist(),1):
        c = c[h[i:][c]==l]
        if c.size==0: return
    return np.arange(c[0],c[0]+nn)

enter image description here

answered Sep 19 '22 06:09

Paul Panzer

Related questions
                            
                                Call column in dataframe by column index instead of column name - pandas
                            
                                Python: How can I tell if my python has SSL?
                            
                                Difference between reverse and [::-1]
                            
                                python decorate function call
                            
                                Zen of Python: Errors should never pass silently. Why does zip work the way it does?
                            
                                Represent infinity as an integer in Python 2.7
                            
                                Sort a sublist of elements in a list leaving the rest in place
                            
                                Why is print("text" + str(var1) + "more text" + str(var2)) described as "disapproved"?
                            
                                Sort a list of tuples in consecutive order
                            
                                Multiple 'for' loops in dictionary generator
                            
                                Rotate minor ticks in matplotlib
                            
                                Python: How to NOT wait for a thread to finish to carry on?
                            
                                Can't run binary from within python aws lambda function
                            
                                Is it possible to show multiple plots in separate windows using matplotlib?
                            
                                Replace multiple newlines with single newlines during reading file
                            
                                Matplotlib - Changing the color of a single x-axis tick label
                            
                                A Better Way to Calculate Odd Ratio in Pandas
                            
                                Select rows by partial string with query with pandas
                            
                                Find nearest indices for one array against all values in another array - Python / NumPy
                            
                                RuntimeError: Never call result.get() within a task Celery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get the index of a list items in another list?

Tags:

python

list

numpy

Bharath

People also ask

2 Answers

MSeifert

Paul Panzer

Recent Activity

Donate For Us