Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign unique id to columns pandas data frame

Tags:

python

pandas

Hello I have the following dataframe

df = 
A      B   
John   Tom
Homer  Bart
Tom    Maggie
Lisa   John 

I would like to assign to each name a unique ID and returns

df = 
A      B         C    D

John   Tom       0    1
Homer  Bart      2    3
Tom    Maggie    1    4 
Lisa   John      5    0

What I have done is the following:

LL1 = pd.concat([df.a,df.b],ignore_index=True)
LL1 = pd.DataFrame(LL1)
LL1.columns=['a']
nameun = pd.unique(LL1.a.ravel())
LLout['c'] = 0
LLout['d'] = 0
NN = list(nameun)
for i in range(1,len(LLout)):
   LLout.c[i] = NN.index(LLout.a[i])
   LLout.d[i] = NN.index(LLout.b[i])

But since I have a very large dataset this process is very slow.

like image 614
emax Avatar asked Oct 22 '15 14:10

emax


People also ask

How do you assign a unique ID in Python?

creating a unique random id To create a random id, you call the uuid4 () method and it will automatically generate a unique id for you just as shown in the example below; Example of usage.

How do I get unique values in multiple columns in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

How do you show unique values in a DataFrame column?

To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e. It returns the count of unique elements in multiple columns.


1 Answers

Here's one way. First get the array of unique names:

In [11]: df.values.ravel()
Out[11]: array(['John', 'Tom', 'Homer', 'Bart', 'Tom', 'Maggie', 'Lisa', 'John'], dtype=object)

In [12]: pd.unique(df.values.ravel())
Out[12]: array(['John', 'Tom', 'Homer', 'Bart', 'Maggie', 'Lisa'], dtype=object)

and make this a Series, mapping names to their respective numbers:

In [13]: names = pd.unique(df.values.ravel())

In [14]: names = pd.Series(np.arange(len(names)), names)

In [15]: names
Out[15]:
John      0
Tom       1
Homer     2
Bart      3
Maggie    4
Lisa      5
dtype: int64

Now use applymap and names.get to lookup these numbers:

In [16]: df.applymap(names.get)
Out[16]:
   A  B
0  0  1
1  2  3
2  1  4
3  5  0

and assign it to the correct columns:

In [17]: df[["C", "D"]] = df.applymap(names.get)

In [18]: df
Out[18]:
       A       B  C  D
0   John     Tom  0  1
1  Homer    Bart  2  3
2    Tom  Maggie  1  4
3   Lisa    John  5  0

Note: This assumes that all the values are names to begin with, you may want to restrict this to some columns only:

df[['A', 'B']].values.ravel()
...
df[['A', 'B']].applymap(names.get)
like image 90
Andy Hayden Avatar answered Sep 22 '22 16:09

Andy Hayden