Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using NumPy to Find Median of Second Element of List of Tuples

Let's say I have a list of tuples, as follows:

list = [(a,1), (b,3), (c,5)]

My goal is to obtain the first element of the median of the list of tuples, using the tuples' second element. In the above case, I would want an output of b, as the median is 3. I tried using NumPy with the following code, to no avail:

import numpy as np

list = [('a',1), ('b',3), ('c',5)]
np.median(list, key=lambda x:x[1])
like image 803
Wally Avatar asked Aug 05 '15 15:08

Wally


People also ask

How to get the median of an array in NumPy?

The numpy.median () statistical function in the NumPy library is used to compute the median along any specified axis. Thus this function returns the median of the array elements as an output.

How do I use the NP_array_2d median function?

The input array, np_array_2d, is a 2-d NumPy array. There are 2 rows and 3 columns. When we use the np.median function on this array with axis = 1, we are telling the function to compute the medians along the direction of axis 1.

What does keepdims do in NumPy median function?

The keepdims parameter forces the median function to keep the dimensions of the output the same as the dimensions of the input. The input array ( np_array_2d) has 2 dimensions, so if we set keepdims = True, the output of np.median will also have 2 dimensions. NumPy’s median function is one of several important functions in the NumPy module.

How to reduce the number of dimensions of the median?

Similarly, if you compute the median and use the axis parameter, the median function will also reduce the number of dimensions. Like we saw in one of the previous examples, if we use np.median on a 2-dimensional array with axis = 0 or axis = 1, the np.median function will compute the column medians or row medians respectively.


Video Answer


2 Answers

You could calculate the median like this:

np.median(dict(list).values()) 
# in Python 2.7; in Python 3.x it would be `np.median(list(dict(list_of_tuples).values()))`

That converts your list to a dictionary first and then calculates the median of its values.

When you want to get the actual key, you can do it like this:

dl = dict(list) #{'a': 1, 'b': 3, 'c': 5}

dl.keys()[dl.values().index(np.median(dl.values()))]

which will print 'b'. That assumes that the median is in the list, if not a ValueError will be thrown. You could therefore then use a try/except like this using the example from @Anand S Kumar's answer:

import numpy as np

l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

# l = [('a',1), ('b',3), ('c',5)]

dl = dict(l)
try:
    print(dl.keys()[dl.values().index(np.median(dl.values()))])
except ValueError:
    print('The median is not in this list. Its value is ',np.median(dl.values()))
    print('The closest key is ', dl.keys()[min(dl.values(), key=lambda x:abs(x-np.median(dl.values())))])

For the first list you will then obtain:

The median is not in this list. Its value is 4.0

The closest key is f

for your example it just prints:

b

like image 129
Cleb Avatar answered Sep 28 '22 10:09

Cleb


np.median does not accept any argument called key . Instead you can use a list comprehension, to take just the second elements from the inner list. Example -

In [3]: l = [('a',1), ('b',3), ('c',5)]

In [4]: np.median([x[1] for x in l])
Out[4]: 3.0

In [5]: l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

In [6]: np.median([x[1] for x in l])
Out[6]: 4.0

Also, if its not for example purpose, do not use list as variable name, it shadows the builtin function list .

like image 43
Anand S Kumar Avatar answered Sep 28 '22 09:09

Anand S Kumar