Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iterate over rows to check if values of an array exists between two columns

Tags:

python

pandas

I have a dataframe and an array like this:

df
x y z
1 10 1
10 20 2
20 30 3
30 40 4
40 50 5

my_array= 5 35 36 40 41 45 46 47 48

How could I iterate over the dataframe so that, rows will be kept if my_array exist between x and y . The final df would be:

x y z
1 10 1
30 40 4
40 50 5

I have tried df=df[(my_array <= df['x']) and (df['y'] <= my_array)]

But It gives value error; Lengths must match to compare.

The length my my_array is larger than number of rows. Any help?

like image 790
zillur rahman Avatar asked May 18 '21 13:05

zillur rahman


2 Answers

Numpy broadcasting

df[((df['x'].values[:, None] <= my_array) &
    (df['y'].values[:, None] >= my_array)).any(1)]

    x   y  z
0   1  10  1
3  30  40  4
4  40  50  5
like image 56
Shubham Sharma Avatar answered Nov 11 '22 15:11

Shubham Sharma


No need to iterate, we can use numpy broadcasting (which can be memory heavy for large datasets):

idx = np.where(
    (df["x"].to_numpy()[:, None] <= my_array) & 
    (df["y"].to_numpy()[:, None] >= my_array)
)[0]

df.iloc[np.unique(idx)]
    x   y  z
0   1  10  1
3  30  40  4
4  40  50  5
like image 37
Erfan Avatar answered Nov 11 '22 15:11

Erfan