I have df:
domain orgid
csyunshu.com 108299
dshu.com 108299
bbbdshu.com 108299
cwakwakmrg.com 121303
ckonkatsunet.com 121303
I would like to add a new column with replaces domain column with numeric ids per orgid:
domain orgid domainid
csyunshu.com 108299 1
dshu.com 108299 2
bbbdshu.com 108299 3
cwakwakmrg.com 121303 1
ckonkatsunet.com 121303 2
I have already tried this line but it does not give the result I want:
df.groupby('orgid').count['domain'].reset_index()
Can anybody help?
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .
Example 1: We can have all values of a column in a list, by using the tolist() method. Syntax: Series. tolist(). Return type: Converted series into List.
You can call rank
on the groupby
object and pass param method='first'
:
In [61]:
df['domainId'] = df.groupby('orgid')['orgid'].rank(method='first')
df
Out[61]:
domain orgid domainId
0 csyunshu.com 108299 1
1 dshu.com 108299 2
2 bbbdshu.com 108299 3
3 cwakwakmrg.com 121303 1
4 ckonkatsunet.com 121303 2
If you want to overwrite the column you can do:
df['domain'] = df.groupby('orgid')['orgid'].rank(method='first')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With