Hello I have the following dataframe
df =
A B
John Tom
Homer Bart
Tom Maggie
Lisa John
I would like to assign to each name a unique ID and returns
df =
A B C D
John Tom 0 1
Homer Bart 2 3
Tom Maggie 1 4
Lisa John 5 0
What I have done is the following:
LL1 = pd.concat([df.a,df.b],ignore_index=True)
LL1 = pd.DataFrame(LL1)
LL1.columns=['a']
nameun = pd.unique(LL1.a.ravel())
LLout['c'] = 0
LLout['d'] = 0
NN = list(nameun)
for i in range(1,len(LLout)):
LLout.c[i] = NN.index(LLout.a[i])
LLout.d[i] = NN.index(LLout.b[i])
But since I have a very large dataset this process is very slow.
creating a unique random id To create a random id, you call the uuid4 () method and it will automatically generate a unique id for you just as shown in the example below; Example of usage.
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e. It returns the count of unique elements in multiple columns.
Here's one way. First get the array of unique names:
In [11]: df.values.ravel()
Out[11]: array(['John', 'Tom', 'Homer', 'Bart', 'Tom', 'Maggie', 'Lisa', 'John'], dtype=object)
In [12]: pd.unique(df.values.ravel())
Out[12]: array(['John', 'Tom', 'Homer', 'Bart', 'Maggie', 'Lisa'], dtype=object)
and make this a Series, mapping names to their respective numbers:
In [13]: names = pd.unique(df.values.ravel())
In [14]: names = pd.Series(np.arange(len(names)), names)
In [15]: names
Out[15]:
John 0
Tom 1
Homer 2
Bart 3
Maggie 4
Lisa 5
dtype: int64
Now use applymap
and names.get
to lookup these numbers:
In [16]: df.applymap(names.get)
Out[16]:
A B
0 0 1
1 2 3
2 1 4
3 5 0
and assign it to the correct columns:
In [17]: df[["C", "D"]] = df.applymap(names.get)
In [18]: df
Out[18]:
A B C D
0 John Tom 0 1
1 Homer Bart 2 3
2 Tom Maggie 1 4
3 Lisa John 5 0
Note: This assumes that all the values are names to begin with, you may want to restrict this to some columns only:
df[['A', 'B']].values.ravel()
...
df[['A', 'B']].applymap(names.get)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With