Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe conditional mean based on column names

It will be the easiest to explain starting with a sample of the dataframe:

    TimeStamp   382.098     382.461     383.185     383.548
    10:28:00    0.012448    0.012362    0.0124485   0.012362
    10:30:00    0.0124135   0.0123965   0.0124135   0.012431
    10:32:00    0.0551035   0.0551725   0.055931    0.0563105
    10:34:00    0.055586    0.0557245   0.056655    0.0569485
    10:36:00    0.055586    0.055776    0.0568105   0.057362

I want my output to be:

    TimeStamp   382         383
    10:28:00    0.012405    0.01240525
    10:30:00    0.012405    0.01242225
    10:32:00    0.05513     0.05612075
    10:34:00    0.05565525  0.05680175
    10:36:00    0.055681    0.05708625

So, I want to look at the column name values and if they are the same up to whole numbers, I want the output col to have the mean of the values for each time index value.

My idea was to use df.round to round the column headers to the nearest whole number and then to use .mean() to somehow apply the mean on axis = 0 for same col headers. But, I get an error using the round function on dataframe index type.

EDIT: based on the answers, I used

df.rename(columns=dict(zip(df.columns[0:], df.columns[0:]\
          .values.astype(float).round().astype(str))),inplace=True)
df = df.groupby(df.columns[0:], axis=1).mean()

And it messes up the column names as well as the values instead of giving me the mean based on col names...no idea why!

like image 663
Brain_overflowed Avatar asked Oct 15 '17 21:10

Brain_overflowed


1 Answers

Use groupby along the 1st axis with a lambda.

df.set_index('TimeStamp', inplace=True)
df.groupby(by=lambda x: int(x.split('.')[0]), axis=1).mean()

                382       383
TimeStamp
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086
like image 162
cs95 Avatar answered Sep 22 '22 01:09

cs95