Python: which is a fast way to find index in pandas dataframe?

Tags:

I have a dataframe like the following

df = 
    a   ID1         ID2         Proximity
0   0   900000498   NaN         0.000000
1   1   900000498   900004585   3.900000
2   2   900000498   900005562   3.900000
3   3   900000498   900008613   0.000000
4   4   900000498   900012333   0.000000
5   5   900000498   900019524   3.900000
6   6   900000498   900019877   0.000000
7   7   900000498   900020141   3.900000
8   8   900000498   900022133   3.900000
9   9   900000498   900022919   0.000000

I want to find for a given couple ID1-ID2 the corresponding Proximity value. For instance given the input [900000498, 900022133] I want as output 3.900000

435

asked Jan 30 '16 22:01

emax

1 Answers

If this is a common operation then I'd set the index to those columns and then you can perform the index lookup using loc and pass a tuple of the col values:

In [60]:
df1 = df.set_index(['ID1','ID2'])

In [61]:
%timeit df1.loc[(900000498,900022133), 'Proximity']
%timeit df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity']
1000 loops, best of 3: 565 µs per loop
100 loops, best of 3: 1.69 ms per loop

You can see that once the cols form the index then lookup is 3x faster than a filter operation.

The output is pretty much the same:

In [63]:
print(df1.loc[(900000498,900022133), 'Proximity'])
print(df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity'])

3.9
8    3.9
Name: Proximity, dtype: float64

167

answered Nov 03 '22 19:11

EdChum

Related questions
                            
                                404 page not found using Django + react-router
                            
                                How to check if a MySQL connection is open in Python?
                            
                                Python partial equivalent in Javascript / jQuery
                            
                                Obtaining the first few rows of a dataframe
                            
                                Computing AUC and ROC curve from multi-class data in scikit-learn (sklearn)?
                            
                                Load Custom Dataset (which is like 20 news group set) in Scikit for Classification of text documents
                            
                                String object to dateTime object in SFrame
                            
                                Why python executable opens new window instance when function by multiprocessing module is called on windows
                            
                                How to get the current page URL in the requests library?
                            
                                How to get the IPMI address of a server?
                            
                                How to know which coroutines were done with asyncio.wait()
                            
                                Numba 3x slower than numpy
                            
                                Get image url from django form
                            
                                Get worker id from Gunicorn worker itself
                            
                                How to keep track of status with multiprocessing and pool.map?
                            
                                Pystache without escaping (unescaped)
                            
                                Prevent GridSpec subplot seperation changing with figure size
                            
                                Finding a specific serial COM port in pySerial (Windows)
                            
                                Correct choice of chunks-specification for dask array
                            
                                Pandas calculate number of values between each range

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: which is a fast way to find index in pandas dataframe?

Tags:

python

find

pandas

dataframe

emax

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us