Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas bin dataframe

Tags:

python

pandas

I have a dataframe with a depth column with a 0.1 m grid.

import pandas as pd


df1 = pd.DataFrame({'depth': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1 ],
            '350': [7.898167, 6.912074, 6.049002, 5.000357, 4.072320, 3.070662, 2.560458, 2.218879, 1.892131, 1.588389, 1.573693],
            '351': [8.094912, 7.090584, 6.221289, 5.154516, 4.211746, 3.217615, 2.670147, 2.305846, 1.952723, 1.641423, 1.622722],
            '352': [8.291657, 7.269095, 6.393576, 5.308674, 4.351173, 3.364569, 2.779837, 2.392813, 2.013316, 1.694456, 1.671752],
            '353': [8.421007, 7.374317, 6.496641, 5.403691, 4.439815, 3.412494, 2.840625, 2.443868, 2.069017, 1.748445, 1.718081 ],
            '354': [8.535562, 7.463452, 6.584512, 5.485725, 4.517310, 3.438680, 2.890678, 2.487039, 2.123644, 1.802643, 1.763818 ],
            '355': [8.650118, 7.552586, 6.672383, 4.517310, 4.594806, 3.464867, 2.940732, 2.530211, 2.178271, 1.856841, 1.809555 ]},
            index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
             )

My question is: how do I bin the data to get a new dataframe on a 0.5 m depth frequency?

Or rather, how do I average the column values from df1 (which have data per each 0.1 m) for the dz=0.5 m bins?

The point is to get the same df structure, same columns (350-355), but the rows should be averaged/binned per column for a certain dz interval (number of rows), let's say 0.5 m

So my new dataframe would have only two rows in this case with depth values of 1.35 and 1.85 m, keeping each column as in df1. The first one would have averaged values for the 1.1-1.6m interval, the second one from 1.6-2.1 m .

like image 954
PEBKAC Avatar asked Jan 27 '23 10:01

PEBKAC


1 Answers

Use a combination of df.groupbyand pd.cut

import pandas as pd
import numpy as np

# Specifiy your desired dz step size
step = 0.5
dz = np.arange(1,3,step)

# rebin dataframe
df2 = df1.groupby(pd.cut(df1.depth, dz, labels=False), as_index=False).mean()

# refill 'depth' column
df2.depth = dz[:-1]

gives

depth   350     351     352     353     354     355
0   1.0     5.986384    6.154609    6.322835    6.427094    6.517312    6.397441
1   1.5     2.266104    2.357551    2.448998    2.502890    2.548537    2.594184
2   2.0     1.573693    1.622722    1.671752    1.718081    1.763818    1.809555

where in each line there is the mean of the 35x columns within 1 < x <= 1.5, 1.5 < x <= 2, etc...

You can easily change the rebinning by selecting a desired value for the step variable.

like image 184
gehbiszumeis Avatar answered Jan 30 '23 00:01

gehbiszumeis