Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group / count list elements by range

If my x list and y list are:

x = [10,20,30]
y = [1,2,3,15,22,27]

I'd like a return value to be a dictionary that has a count of the elements that were less than the x value:

{
    10:3,
    20:1,
    30:2,
}

I have a very large list, so I was hoping there was a better way to do it that didn't involve a slow nested for loop. I've looked at collections.Counter and itertools and neither seem to offer a way of grouping. Is there a built-in that can do this?

like image 296
pyInTheSky Avatar asked Dec 09 '22 12:12

pyInTheSky


2 Answers

You can use the bisect module and collections.Counter:

>>> import bisect
>>> from collections import Counter
>>> Counter(x[bisect.bisect_left(x, item)] for item in y)
Counter({10: 3, 30: 2, 20: 1})
like image 98
Ashwini Chaudhary Avatar answered Dec 29 '22 00:12

Ashwini Chaudhary


If you're willing to use numpy, basically you are asking for a histogram:

x = [10,20,30]
y = [1,2,3,15,22,27]

np.histogram(y,bins=[0]+x)
#(array([3, 1, 2]), array([ 0, 10, 20, 30]))

To make this a dict:

b = np.histogram(y,bins=[0]+x)[0]
d = { k:v for k,v in zip(x, b)}

For short lists, this isn't worth it, but if your lists are long, it might be:

In [292]: y = np.random.randint(0, 30, 1000)

In [293]: %%timeit
   .....: b = np.histogram(y, bins=[0]+x)[0]
   .....: d = { k:v for k,v in zip(x, b)}
   .....: 
1000 loops, best of 3: 185 µs per loop

In [294]: y = list(y)

In [295]: timeit Counter(x[bisect.bisect_left(x, item)] for item in y)
100 loops, best of 3: 3.84 ms per loop

In [311]: timeit dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
100 loops, best of 3: 3.75 ms per loop
like image 22
askewchan Avatar answered Dec 29 '22 00:12

askewchan