I've got a Pandas Panel with many DataFrames with the same rows/column labels. I want to make a new panel with DataFrames that fulfill certain criteria based on a couple columns.
This is easy with dataframes and rows: Say I have a df, zHe_compare. I can get the suitable rows with:
zHe_compare[(zHe_compare['zHe_calc'] > 100) & (zHe_compare['zHe_med'] > 100) | ((zHe_obs_lo_2s <=zHe_compare['zHe_calc']) & (zHe_compare['zHe_calc'] <= zHe_obs_hi_2s))]
but how do I do (pseudocode, simplified boolean):
good_results_panel = results_panel[ all_dataframes[ sum ('zHe_calc' < 'zHe_obs') > min_num ] ]
I know the the inner boolean part, but how do I specify this for each dataframe in a panel? Because I need multiple columns from each df, I haven't met success using the panel.minor_xs
slicing techniques.
thanks!
As mentioned in its documentation, Panel
is currently a bit under-developed, so the sweet syntax you've come to rely on when working with DataFrame
isn't there yet.
Meanwhile, I would suggest using the Panel.select
method:
def is_good_result(item_label):
# whatever condition over the selected item
df = results_panel[item_label]
return df['col1'].sum() > 5
good_results = results.select(is_good_result)
The is_good_result
function returns a boolean value. Note that its argument is not a DataFrame
instance, because Panel.select
applies its argument to the item label, rather than the DataFrame
content of that item.
Of course, you can stuff that whole criterion function into a lambda in one statement, if you're into the whole brevity thing:
good_results = results.select(
lambda item_label: results[item_label]['col1'].sum() > 5
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With