I would like to generate an integer-based unique ID for users (in my df).
Let's say I have:
index first last dob
0 peter jones 20000101
1 john doe 19870105
2 adam smith 19441212
3 john doe 19870105
4 jenny fast 19640822
I would like to generate an ID column like so:
index first last dob id
0 peter jones 20000101 1244821450
1 john doe 19870105 1742118427
2 adam smith 19441212 1841181386
3 john doe 19870105 1742118427
4 jenny fast 19640822 1687411973
10 digit ID, but it's based on the value of the fields (john doe identical row values get the same ID).
I've looked into hashing, encrypting, UUID's but can't find much related to this specific non-security use case. It's just about generating an internal identifier.
Feel like I may be tackling this the wrong way as I can't find much literature on it!
Thanks
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])
uuid1() is defined in UUID library and helps to generate the random id using MAC address and time component. bytes : Returns id in form of 16 byte string. int : Returns id in form of 128-bit integer. hex : Returns random id as 32 character hexadecimal string.
The simplest way to generate identifiers is by a serial number. A steadily increasing number that is assigned to whatever you need to identify next. This is the approached used in most internal databases as well as some commonly encountered public identifiers.
List of all unique values in a pandas dataframe column. You can use the pandas unique() function to get the different unique values present in a column. It returns a numpy array of the unique values in the column.
You can try using hash function.
df['id'] = df[['first', 'last']].sum(axis=1).map(hash)
Please note the hash id is greater than 10 digits and is a unique integer sequence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With