I have a dataframe with a column of floating numbers. For example:
df = pd.DataFrame({'A' : np.random.randn(100), 'B': np.random.randn(100)})
What I want to do is to group by column A after rounding column A to 2 decimal places.
The way I do it is highly inefficient:
df.groupby(df.A.map(lambda x: "%.2f" % x))
I particularly don't want to convert everything to a string, as speed becomes a huge problem. But I don't feel it is safe to do the following:
df.groupby(np.around(df.A, 2))
I am not sure, but I feel that there might be cases where two float64 numbers will have the same string representation after rounding to 2 decimal places, but might have slightly different representations when np.around to 2 decimal places. For example, is it possible a string representation of 1.52 can be represented by np.around(., 2) as 1.52000001 sometimes but 1.51999999 some other times?
My question is what is a better and more efficient way.
import pandas as pd
from random import random
df = pd.DataFrame({'A' : map(lambda x: random(), range(100000)), 'B': map(lambda x: random(), range(100000))})
df.groupby(df['A'].apply(lambda x: round(x, 1))).count()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With