I have a DataFrame with an ID column and some features columns. I'd like to see a description of how many unique IDs are there per column values.
The following code works but I wonder if there a better way than the to_frame().unstack().unstack()
line which transposes the .describe()
series result to DataFrame where the columns are the percentiles, max, min ...
def unique_ids(df):
rows = []
for col in sorted(c for c in df.columns if c != id_col):
v = df.groupby(col)[id_col].nunique().describe()
v = v.to_frame().unstack().unstack() # Transpose
v.index = [col]
rows.append(v)
return pd.concat(rows)
It seems you need change:
v = v.to_frame().unstack().unstack()
to
v = v.to_frame().T
Or is possible transpose
final DataFrame
, also is added rename
by col
:
df = pd.DataFrame({'ID':[1,1,3],
'E':[4,5,5],
'C':[7,8,9]})
print (df)
C E ID
0 7 4 1
1 8 5 1
2 9 5 3
def unique_ids(df):
rows = []
id_col = 'ID'
for col in sorted(c for c in df.columns if c != id_col):
v = df.groupby(col)[id_col].nunique().describe().rename(col)
rows.append(v)
return pd.concat(rows, axis=1).T
print (unique_ids(df))
count mean std min 25% 50% 75% max
C 3.0 1.0 0.000000 1.0 1.00 1.0 1.00 1.0
E 2.0 1.5 0.707107 1.0 1.25 1.5 1.75 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With