Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas table lookup

I have a pandas lookup table which looks like this

Grade   Lower_Boundary  Upper_Boundary
1   -110    -96
2   -96 -91
3   -91 -85
4   -85 -81
5   -81 -77
6   -77 -72
7   -72 -68
8   -68 -63
9   -63 -58
10  -58 -54
11  -54 -50
12  -50 -46
13  -46 -42
14  -42 -38
15  -38 -34
16  -34 -28
17  -28 -18
18  -18 -11
19  -11 -11
20  -11 -9

I have another pandas dataframe that looks contains score. I want to assign 'Grade' to the score column, by looking up the look up table. So based on which interval of lower and upper boundary the score falls, the grade should be assigned from that row in the lookup table. Is there a way to do it without typing a bunch of if then else statements? I am thinking just of excel's index match.

Score   Grade
-75 6
-75 6
-60 9
-66 8
-66 8
-98 1
-60 9
-82 4
-70 7
-60 9
-60 9
-60 9
-56 10
-70 7
-70 7
-70 7
-66 8
-56 10
-66 8
-66 8
like image 423
Zenvega Avatar asked Feb 17 '16 22:02

Zenvega


People also ask

Do Pandas do lookups?

Pandas DataFrame: lookup() functionThe lookup() function returns label-based "fancy indexing" function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

How do you do a Vlookup in Pandas?

We can use merge() function to perform Vlookup in pandas. The merge function does the same job as the Join in SQL We can perform the merge operation with respect to table 1 or table 2. There can be different ways of merging the 2 tables.

What is lookup table in python?

Lookup Table is used to access the values of the database from tables easily. Using this, we can quickly get the output values of corresponding input values from the given table. In python, lookup tables are also known as dictionaries.

What does .values in Pandas do?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.


1 Answers

A one-line solution (I call your lookup table lookup):

df['Score'].apply(lambda score: lookup['Grade'][(lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)].values[0])

Explanation:

For a given score, here is how to find the grade:

score = -75
match = (lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)
grade = lookup['Grade'][match]

This return a series of length 1. You can get its value with, for instance:

grade.values[0]

All you need to do is apply the above to the score column. If you want a one-liner, use a lambda function:

df['Score'].apply(lambda score: lookup['Grade'][(lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)].values[0])

Otherwise the following would be more readable:

def lookup_grade(score):
    match = (lookup['Lower_Boundary'] <= score) & (lookup['Upper_Boundary'] > score)
    grade = lookup['Grade'][match]
    return grade.values[0]

df['Score'].apply(lookup_grade)

This approach would also make it easier to deal with cases when no match is found.

like image 138
IanS Avatar answered Oct 09 '22 11:10

IanS