I have a dataframe that looks like this:
D X Y Z
A 22 16 23
A 21 16 22
A 20 17 21
B 33 50 11
B 34 53 12
B 34 55 13
C 44 34 11
C 45 33 11
C 45 33 10
D 55 35 60
D 57 34 61
E 66 36 13
E 67 38 14
E 67 37 13
I want to get the minimum and maximum values of the categorical variable D
across all the column values and so the output dataframe should look something like this:
D Xmin Xmax Ymin Ymax Zmin Zmax
A 20 22 16 17 21 23
B 33 34 50 55 11 13
C 44 45 33 34 10 11
D 55 57 34 35 60 61
E 66 67 36 38 13 14
I have tried this, but no luck:
min_max_df = dfObj.groupby('D').agg({'X': [dfObj.min(axis=0), dfObj.max(axis=0)]})
The basic strategy is to convert each category value into a new column and assign a 1 or 0 (True/False) value to the column. This has the benefit of not weighting a value improperly. There are many libraries out there that support one-hot encoding but the simplest one is using pandas ' . get_dummies() method.
from itertools import product
aggs = {f"{col}{fn}": (col, fn) for col,fn in product(['X', 'Y', 'Z'], ['min', 'max'])}
df.groupby('D').agg(**aggs)
>>>
Xmin Xmax Ymin Ymax Zmin Zmax
D
A 20 22 16 17 21 23
B 33 34 50 55 11 13
C 44 45 33 34 10 11
D 55 57 34 35 60 61
E 66 67 36 38 13 14
df = df.groupby('D').agg(['min', 'max'])
Output:
>>> df
X Y Z
min max min max min max
D
A 20 22 16 17 21 23
B 33 34 50 55 11 13
C 44 45 33 34 10 11
D 55 57 34 35 60 61
E 66 67 36 38 13 14
>>> df['X']['min']
D
A 20
B 33
C 44
D 55
E 66
Name: min, dtype: int64
You can flatten the columns as well:
df.columns = df.columns.map(''.join)
df.rename_axis(None)
Xmin Xmax Ymin Ymax Zmin Zmax
A 20 22 16 17 21 23
B 33 34 50 55 11 13
C 44 45 33 34 10 11
D 55 57 34 35 60 61
E 66 67 36 38 13 14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With