Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Panel fancy indexing: How to return (index of) all DataFrames in Panel based on Boolean of multiple columns in each df

I've got a Pandas Panel with many DataFrames with the same rows/column labels. I want to make a new panel with DataFrames that fulfill certain criteria based on a couple columns.

This is easy with dataframes and rows: Say I have a df, zHe_compare. I can get the suitable rows with:

zHe_compare[(zHe_compare['zHe_calc'] > 100) & (zHe_compare['zHe_med'] > 100) | ((zHe_obs_lo_2s <=zHe_compare['zHe_calc']) & (zHe_compare['zHe_calc'] <= zHe_obs_hi_2s))]

but how do I do (pseudocode, simplified boolean):

good_results_panel = results_panel[ all_dataframes[ sum ('zHe_calc' < 'zHe_obs') > min_num ] ]

I know the the inner boolean part, but how do I specify this for each dataframe in a panel? Because I need multiple columns from each df, I haven't met success using the panel.minor_xs slicing techniques.

thanks!

like image 631
cossatot Avatar asked Nov 22 '12 04:11

cossatot


1 Answers

As mentioned in its documentation, Panel is currently a bit under-developed, so the sweet syntax you've come to rely on when working with DataFrame isn't there yet.

Meanwhile, I would suggest using the Panel.select method:

def is_good_result(item_label):
    # whatever condition over the selected item
    df = results_panel[item_label]
    return df['col1'].sum() > 5

good_results = results.select(is_good_result)

The is_good_result function returns a boolean value. Note that its argument is not a DataFrame instance, because Panel.select applies its argument to the item label, rather than the DataFrame content of that item.

Of course, you can stuff that whole criterion function into a lambda in one statement, if you're into the whole brevity thing:

good_results = results.select(
                 lambda item_label: results[item_label]['col1'].sum() > 5
                 )
like image 76
assaflavi Avatar answered Oct 05 '22 00:10

assaflavi