Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column

Tags:

python

pandas

The title is bit confusing but I'll do my best to explain my problem here. I have 2 pandas dataframes, a and b:

>> print a

id | value
 1 | 250
 2 | 150
 3 | 350
 4 | 550
 5 | 450

>> print b

low | high | class
100 | 200  | 'A' 
200 | 300  | 'B' 
300 | 500  | 'A' 
500 | 600  | 'C' 

I want to create a new column called class in table a that contains the class of the value in accordance with table b. Here's the result I want:

>> print a

id | value | class
 1 | 250   | 'B'
 2 | 150   | 'A'
 3 | 350   | 'A'
 4 | 550   | 'C'
 5 | 450   | 'A'

I have the following code written that sort of does what I want:

a['class'] = pd.Series()
for i in range(len(a)):
    val = a['value'][i]
    cl = (b['class'][ (b['low'] <= val) \
                      (b['high'] >= val) ].iat[0])
    a['class'].set_value(i,cl)

Problem is, this is quick for tables length of 10 or so, but I am trying to do this with a table size of 100,000+ for both a and b. Is there a quicker way to do this, using some function/attribute in pandas?

like image 598
rbae Avatar asked Jul 28 '17 02:07

rbae


1 Answers

Here is a way to do a range join inspired by @piRSquared's solution:

A = a['value'].values
bh = b.high.values
bl = b.low.values

i, j = np.where((A[:, None] >= bl) & (A[:, None] <= bh))

pd.DataFrame(
    np.column_stack([a.values[i], b.values[j]]),
    columns=a.columns.append(b.columns)
)

Output:

  id value  low high  class
0  1   250  200  300   'B' 
1  2   150  100  200   'A' 
2  3   350  300  500   'A' 
3  4   550  500  600   'C' 
4  5   450  300  500   'A'
like image 50
Scott Boston Avatar answered Nov 14 '22 22:11

Scott Boston