Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Group by rounded floating number

I have a dataframe with a column of floating numbers. For example:

df = pd.DataFrame({'A' : np.random.randn(100), 'B': np.random.randn(100)})

What I want to do is to group by column A after rounding column A to 2 decimal places.

The way I do it is highly inefficient:

df.groupby(df.A.map(lambda x: "%.2f" % x))

I particularly don't want to convert everything to a string, as speed becomes a huge problem. But I don't feel it is safe to do the following:

df.groupby(np.around(df.A, 2))

I am not sure, but I feel that there might be cases where two float64 numbers will have the same string representation after rounding to 2 decimal places, but might have slightly different representations when np.around to 2 decimal places. For example, is it possible a string representation of 1.52 can be represented by np.around(., 2) as 1.52000001 sometimes but 1.51999999 some other times?

My question is what is a better and more efficient way.

like image 258
Tom Bennett Avatar asked Jan 08 '16 18:01

Tom Bennett


1 Answers

I think you not need to convert float to string.

import pandas as pd
from random import random
df = pd.DataFrame({'A' : map(lambda x: random(), range(100000)), 'B': map(lambda x: random(), range(100000))})
df.groupby(df['A'].apply(lambda x: round(x, 1))).count()
like image 189
xmduhan Avatar answered Oct 09 '22 21:10

xmduhan