I have a DataFrame that's created from a list of dictionaries, which I obtained from a database. I'm trying to use that as an in-memory database, which I query using the following function:
def filter_entities(df, name1, name2):
key = ((df.name1 == name1) &
(df.name2 == name2))
rows = df.loc[key]
if len(rows) == 0:
return None
return rows.iloc[0]
Doing that seems to be much slower than I expected. Even when testing on a couple of hundred rows, it takes about 1ms per call. I've tried setting an index on those columns when I create the dataframe but it didn't affect the performance:
entities.set_index(['name1', 'name2'], drop=False, inplace=True)
Here's a quick way to create a dataset for testing:
import random, string
import pandas as pd
df = pd.DataFrame([{
'name1': ''.join([random.choice(string.letters) for i in range(10)]),
'name2': ''.join([random.choice(string.letters) for i in range(10)]),
'val1': random.randint(0, 2**16),
'val2': random.randint(0, 2**16),
'val3': random.randint(0, 2**16),
} for j in range(1000)])
In[27]: %timeit filter_entities(df, df['name1'][100], df['name2'][100])
1000 loops, best of 3: 1.91 ms per loop
I'm trying to find an efficient way to make queries on my data. Is there a better way to do this in pandas?
Setting the columns as the index does improve the performance for me.
With your original filter_entities
function:
In [25]: %timeit filter_entities(df, df['name1'][100], df['name2'][100])
1000 loops, best of 3: 1.36 ms per loop
And after setting the columns as the index, and then indexing that frame:
In [26]: df2 = df.set_index(['name1', 'name2'])
In [27]: %timeit df2.loc[df['name1'][100], df['name2'][100]]
10000 loops, best of 3: 160 µs per loop
Note that a large portion of the time spent in the filter_entities
function is used for the boolean comparison (creation of the key
, and not for the indexing itself).
Second note: if the performance on this scale matters to you, then in many cases it is also useful to think about the larger picture if you need to access individual rows in this way repeatedly, or if you can do it more vectorized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With