Pandas bin dataframe

Question

I have a dataframe with a depth column with a 0.1 m grid.

import pandas as pd


df1 = pd.DataFrame({'depth': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1 ],
            '350': [7.898167, 6.912074, 6.049002, 5.000357, 4.072320, 3.070662, 2.560458, 2.218879, 1.892131, 1.588389, 1.573693],
            '351': [8.094912, 7.090584, 6.221289, 5.154516, 4.211746, 3.217615, 2.670147, 2.305846, 1.952723, 1.641423, 1.622722],
            '352': [8.291657, 7.269095, 6.393576, 5.308674, 4.351173, 3.364569, 2.779837, 2.392813, 2.013316, 1.694456, 1.671752],
            '353': [8.421007, 7.374317, 6.496641, 5.403691, 4.439815, 3.412494, 2.840625, 2.443868, 2.069017, 1.748445, 1.718081 ],
            '354': [8.535562, 7.463452, 6.584512, 5.485725, 4.517310, 3.438680, 2.890678, 2.487039, 2.123644, 1.802643, 1.763818 ],
            '355': [8.650118, 7.552586, 6.672383, 4.517310, 4.594806, 3.464867, 2.940732, 2.530211, 2.178271, 1.856841, 1.809555 ]},
            index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
             )

My question is: how do I bin the data to get a new dataframe on a 0.5 m depth frequency?

Or rather, how do I average the column values from df1 (which have data per each 0.1 m) for the dz=0.5 m bins?

The point is to get the same df structure, same columns (350-355), but the rows should be averaged/binned per column for a certain dz interval (number of rows), let's say 0.5 m

So my new dataframe would have only two rows in this case with depth values of 1.35 and 1.85 m, keeping each column as in df1. The first one would have averaged values for the 1.1-1.6m interval, the second one from 1.6-2.1 m .

gehbiszumeis · Accepted Answer

Use a combination of df.groupbyand pd.cut

import pandas as pd
import numpy as np

# Specifiy your desired dz step size
step = 0.5
dz = np.arange(1,3,step)

# rebin dataframe
df2 = df1.groupby(pd.cut(df1.depth, dz, labels=False), as_index=False).mean()

# refill 'depth' column
df2.depth = dz[:-1]

gives

depth   350     351     352     353     354     355
0   1.0     5.986384    6.154609    6.322835    6.427094    6.517312    6.397441
1   1.5     2.266104    2.357551    2.448998    2.502890    2.548537    2.594184
2   2.0     1.573693    1.622722    1.671752    1.718081    1.763818    1.809555

where in each line there is the mean of the 35x columns within 1 < x <= 1.5, 1.5 < x <= 2, etc...

You can easily change the rebinning by selecting a desired value for the step variable.

Pandas bin dataframe

Tags:

python

pandas

PEBKAC

1 Answers

gehbiszumeis

Recent Activity

Donate For Us

Pandas bin dataframe

Tags:

python

pandas

PEBKAC

1 Answers

gehbiszumeis

Related questions

Recent Activity

Donate For Us