I have the following two dataframes DF1 and DF2. I would like to filter DF1 based on the multi-index of DF2.
DF1:
                            Value
Date        ID      Name       
2014-04-30  1001    n1        1
2014-05-31  1002    n2        2
2014-06-30  1003    n3        3
2014-07-31  1004    n4        4
DF2 (index = Date, ID, Name):
Date        ID      Name       
2014-05-31  1002    n2        
2014-06-30  1003    n3        
What i would like is this:
                            Value
Date        ID      Name       
2014-05-31  1002    n2        2
2014-06-30  1003    n3        3
To do this i simply use:
f_df = df1.ix[df2.index]
However, when doing this what i am getting is this (notice the tuple index)
                            Value
(2014-05-31, 1002, n2)      2
(2014-06-31, 1003, n3)      4
How can i achieve what i am looking for? which is a resulting dataframes without a tuple index?
In Pandas version 0.14 you can use df1.loc[df2.index]:
import io
import pandas as pd
print(pd.__version__)
# 0.14.0
df1 = io.BytesIO('''\
Date        ID      Name    Value   
2014-04-30  1001    n1        1
2014-05-31  1002    n2        2
2014-06-30  1003    n3        3
2014-07-31  1004    n4        4
''')
df2 = io.BytesIO('''\
Date        ID      Name    Value   
2014-05-31  1002    n2        2
2014-06-30  1003    n3        3
''')
df1 = pd.read_table(df1, sep='\s+').set_index(['Date', 'ID', 'Name'])
df2 = pd.read_table(df2, sep='\s+').set_index(['Date', 'ID', 'Name'])
print(df1.loc[df2.index])
yields
                      Value
Date       ID   Name       
2014-05-31 1002 n2        2
2014-06-30 1003 n3        3
I believe this is because as of version 0.14 df.loc can accept a list of labels, and df2.index is list-like:
In [88]: list(df2.index)
Out[88]: [('2014-05-31', 1002L, 'n2'), ('2014-06-30', 1003L, 'n3')]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With