Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter Pandas DataFrame using another DataFrame

I have a multi-index DataFrame with the first level as the group id and the second level as the element name. There are many more groups but only the first is shown below.

                   2000-01-04  2000-01-05 
Group Element                                     
1       A          -0.011374    0.035895 
        X          -0.006910    0.047714 
        C          -0.016609    0.038705 
        Y          -0.088110   -0.052775 
        H           0.000000    0.008082 

I have another DataFrame containing only 1 index that is the group id. The columns for both are the same and they are dates.

         2000-01-04  2000-01-05 
Group                                     
1        -0.060623   -0.025429 
2        -0.066765   -0.005318 
3        -0.034459   -0.011243 
4        -0.051813   -0.019521 
5        -0.064367    0.014810 

I want to use the second DataFrame to filter the first one by checking if each element is smaller than the value of the group on that date to get something like this:

                   2000-01-04  2000-01-05 
Group Element                                     
1       A          False        False     
        X          False        False     
        C          False        False     
        Y          True         True
        H          False        False     

Ultimately, I am only interested in the elements that were True and the dates in which they were True. A list of elements that were true over an iteration of dates would be great, which I've though to do by making the False NaN and then using dropNa().

I know I can write bunch of nested for loops to do this but time is of crucial importance; I can't think of a way to use pandas dataframe structure intrinsically and pythonically to do this. Any help would greatly appreciated!

like image 880
rmalhotra Avatar asked Oct 02 '22 20:10

rmalhotra


1 Answers

You could use a groupby apply for this:

In [11]: g = df1.groupby(level='Group')

In [12]: g.apply(lambda x: x <= df2.loc[x.name])
Out[12]: 
              2000-01-04 2000-01-05
Group Element                      
1     A            False      False
      X            False      False
      C            False      False
      Y             True       True
      H            False      False
like image 163
Andy Hayden Avatar answered Oct 10 '22 01:10

Andy Hayden