I have a DataFrame as following.
df = pd.DataFrame({'col1': ['a','b','c','c','d','e','a','h','i','a'],'col2':['3:00','3:00','4:00','4:00','3:00','5:00','5:00','3:00','3:00','2:00']})
df
Out[83]:
col1 col2
0 a 3:00
1 b 3:00
2 c 4:00
3 c 4:00
4 d 3:00
5 e 5:00
6 a 5:00
7 h 3:00
8 i 3:00
9 a 2:00
What I'd like to do is groupby 'col1' and assign a unique ID to different values in col2 as following:
col1 col2 ID
a 2:00 0
a 3:00 1
a 5:00 2
b 3:00 0
c 4:00 0
c 4:00 0
...
I tried to use pd.Categorical but can't quite get to where I wanted to be.
To count unique values per groups in Python Pandas, we can use df. groupby('column_name'). count().
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
we can use pd.factorize() method:
In [170]: df['ID'] = df.groupby('col1')['col2'].transform(lambda x: pd.factorize(x)[0])
In [171]: df
Out[171]:
col1 col2 ID
0 a 3:00 0
1 b 3:00 0
2 c 4:00 0
3 c 4:00 0
4 d 3:00 0
5 e 5:00 0
6 a 5:00 1
7 h 3:00 0
8 i 3:00 0
9 a 2:00 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With