I have a dataframe like the following
df =
a ID1 ID2 Proximity
0 0 900000498 NaN 0.000000
1 1 900000498 900004585 3.900000
2 2 900000498 900005562 3.900000
3 3 900000498 900008613 0.000000
4 4 900000498 900012333 0.000000
5 5 900000498 900019524 3.900000
6 6 900000498 900019877 0.000000
7 7 900000498 900020141 3.900000
8 8 900000498 900022133 3.900000
9 9 900000498 900022919 0.000000
I want to find for a given couple ID1-ID2
the corresponding Proximity
value.
For instance given the input [900000498, 900022133]
I want as output 3.900000
To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.
NumPy has all of the computation capabilities of Pandas, but uses pre-compiled, optimized methods. This mean NumPy can be significantly faster than Pandas. Converting a DataFrame from Pandas to NumPy is relatively straightforward.
Like a Python dictionary (or a relational database's index), Pandas indexing provides a fast way to turn a key into a value. For example, we can create a dataframe with index alpha : and then turn the key b into the row of interest.
From the above, we can see that for summation, the DataFrame implementation is only slightly faster than the List implementation. This difference is much more pronounced for the more complicated Haversine function, where the DataFrame implementation is about 10X faster than the List implementation.
If this is a common operation then I'd set the index to those columns and then you can perform the index lookup using loc
and pass a tuple of the col values:
In [60]:
df1 = df.set_index(['ID1','ID2'])
In [61]:
%timeit df1.loc[(900000498,900022133), 'Proximity']
%timeit df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity']
1000 loops, best of 3: 565 µs per loop
100 loops, best of 3: 1.69 ms per loop
You can see that once the cols form the index then lookup is 3x faster than a filter operation.
The output is pretty much the same:
In [63]:
print(df1.loc[(900000498,900022133), 'Proximity'])
print(df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity'])
3.9
8 3.9
Name: Proximity, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With