Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

First common element from two lists

Tags:

python

x = [8,2,3,4,5]
y = [6,3,7,2,1]

How to find out the first common element in two lists (in this case, "2") in a concise and elegant way? Any list can be empty or there can be no common elements - in this case None is fine.

I need this to show python to someone who is new to it, so the simpler the better.

UPD: the order is not important for my purposes, but let's assume I'm looking for the first element in x that also occurs in y.

like image 678
georg Avatar asked Apr 20 '13 09:04

georg


People also ask

How do you find the common element of two lists?

Method 2:Using Set's intersection property Convert the list to set by conversion. Use the intersection function to check if both sets have any elements in common. If they have many elements in common, then print the intersection of both sets.

How do you find the first common element of two arrays?

First Method (Naive Approach) – Find Common Elements in Two Arrays using Two For Loops. In this approach, we take each element of a first array and compare with every element of a second array. If it is found in second array then it's a common element else we move to next element.


1 Answers

A sort is not the fastest way of doing this, this gets it done in O(N) time with a set (hash map).

>>> x = [8,2,3,4,5]
>>> y = [6,3,7,2,1]
>>> set_y = set(y)
>>> next((a for a in x if a in set_y), None)
2

Or:

next(ifilter(set(y).__contains__, x), None)

This is what it does:

>>> def foo(x, y):
        seen = set(y)
        for item in x:
            if item in seen:
                return item
        else:
            return None


>>> foo(x, y)
2

To show the time differences between the different methods (naive approach, binary search an sets), here are some timings. I had to do this to disprove the suprising number of people that believed binary search was faster...:

from itertools import ifilter
from bisect import bisect_left

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

def common_set(x, y, ifilter=ifilter, set=set, next=next):
    return next(ifilter(set(y).__contains__, x), None)
    pass

def common_b_sort(x, y, bisect=bisect_left, sorted=sorted, min=min, len=len):
    sorted_y = sorted(y)
    for a in x:
        if a == sorted_y[min(bisect_left(sorted_y, a),len(sorted_y)-1)]:
            return a
    else:
        return None

def common_naive(x, y):
    for a in x:
        for b in y:
            if a == b: return a
    else:
        return None

from timeit import timeit
from itertools import repeat
import threading, thread

print 'running tests - time limit of 20 seconds'

for x, y in [('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'h'), ('i', 'j')]:
    for func in ('common_set', 'common_b_sort', 'common_naive'):        
        try:
            timer = threading.Timer(20, thread.interrupt_main)   # 20 second time limit
            timer.start()
            res = timeit(stmt="print '[', {0}({1}, {2}), ".format(func, x, y),
                         setup='from __main__ import common_set, common_b_sort, common_naive, {0}, {1}'.format(x, y),
                         number=1)
        except:
            res = "Too long!!"
        finally:
            print '] Function: {0}, {1}, {2}. Time: {3}'.format(func, x, y, res)
            timer.cancel()

The test data was:

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

Results:

running tests - time limit of 20 seconds
[ 9 ] Function: common_set, a, b. Time: 0.00569520707241
[ 9 ] Function: common_b_sort, a, b. Time: 0.0182240340602
[ 9 ] Function: common_naive, a, b. Time: 0.00978832505249
[ 7 ] Function: common_set, c, d. Time: 0.249175872911
[ 7 ] Function: common_b_sort, c, d. Time: 1.86735751332
[ 7 ] Function: common_naive, c, d. Time: 0.264309220865
[ 40000 ] Function: common_set, e, f. Time: 0.00966861710078
[ 40000 ] Function: common_b_sort, e, f. Time: 0.0505980508696
[ ] Function: common_naive, e, f. Time: Too long!!
[ None ] Function: common_set, g, h. Time: 1.11300018578
[ None ] Function: common_b_sort, g, h. Time: 14.9472068377
[ ] Function: common_naive, g, h. Time: Too long!!
[ 5411743 ] Function: common_set, i, j. Time: 1.88894859542
[ 5411743 ] Function: common_b_sort, i, j. Time: 6.28617268396
[ 5411743 ] Function: common_naive, i, j. Time: 1.11231867458

This gives you an idea of how it will scale for larger inputs, O(N) vs O(N log N) vs O(N^2)

like image 129
jamylak Avatar answered Oct 14 '22 07:10

jamylak