I have a dataframe d
with about 100,000,000
rows and 3
columns. It looks something like this:
import pandas as pd
In [17]: d = pd.DataFrame({'id': ['a', 'b', 'c', 'd', 'e'], 'val': [1, 2, 3, 4, 5], 'n': [34, 22, 95, 86, 44]})
In [18]: d.set_index(['id', 'val'], inplace = True)
I have another dataframe with values of id
and val
that I want to keep in d
. There are around 600,000 combinations of id
and val
that I want to keep:
In [20]: keep = pd.DataFrame({'id':['a', 'b'], 'val' : [1, 2]})
I have tried this in the following way:
In [21]: keep.set_index(['id', 'val'], inplace = True)
In [22]: d.loc[d.index.isin(keep.index), :]
Out [22]:
n
id val
a 1 34
b 2 22
This works but seems clunky and is very slow. Is there a better approach here? What is the fastest way to slice on Multindex in pandas?
Using reindex
d.reindex(pd.MultiIndex.from_frame(keep))
Out[151]:
n
id val
a 1 34
b 2 22
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With