Hello I have the following dataframe <pre class="prettyprint"><code>df = A B John Tom Homer Bart Tom Maggie Lisa John </code></pre> I would like to assign to each name a unique ID and returns <pre class="prettyprint"><code>df = A B C D John Tom 0 1 Homer Bart 2 3 Tom Maggie 1 4 Lisa John 5 0 </code></pre> What I have done is the following: <pre class="prettyprint"><code>LL1 = pd.concat([df.a,df.b],ignore_index=True) LL1 = pd.DataFrame(LL1) LL1.columns=['a'] nameun = pd.unique(LL1.a.ravel()) LLout['c'] = 0 LLout['d'] = 0 NN = list(nameun) for i in range(1,len(LLout)): LLout.c[i] = NN.index(LLout.a[i]) LLout.d[i] = NN.index(LLout.b[i]) </code></pre> But since I have a very large dataset this process is very slow.

Here's one way. First get the array of unique names: <pre class="prettyprint"><code>In [11]: df.values.ravel() Out[11]: array(['John', 'Tom', 'Homer', 'Bart', 'Tom', 'Maggie', 'Lisa', 'John'], dtype=object) In [12]: pd.unique(df.values.ravel()) Out[12]: array(['John', 'Tom', 'Homer', 'Bart', 'Maggie', 'Lisa'], dtype=object) </code></pre> and make this a Series, mapping names to their respective numbers: <pre class="prettyprint"><code>In [13]: names = pd.unique(df.values.ravel()) In [14]: names = pd.Series(np.arange(len(names)), names) In [15]: names Out[15]: John 0 Tom 1 Homer 2 Bart 3 Maggie 4 Lisa 5 dtype: int64 </code></pre> Now use <code>applymap</code> and <code>names.get</code> to lookup these numbers: <pre class="prettyprint"><code>In [16]: df.applymap(names.get) Out[16]: A B 0 0 1 1 2 3 2 1 4 3 5 0 </code></pre> and assign it to the correct columns: <pre class="prettyprint"><code>In [17]: df[["C", "D"]] = df.applymap(names.get) In [18]: df Out[18]: A B C D 0 John Tom 0 1 1 Homer Bart 2 3 2 Tom Maggie 1 4 3 Lisa John 5 0 </code></pre> Note: This assumes that all the values are names to begin with, you may want to restrict this to some columns only: <pre class="prettyprint"><code>df[['A', 'B']].values.ravel() ... df[['A', 'B']].applymap(names.get) </code></pre>

Assign unique id to columns pandas data frame

Tags:

python

pandas

Hello I have the following dataframe

df = 
A      B   
John   Tom
Homer  Bart
Tom    Maggie
Lisa   John

I would like to assign to each name a unique ID and returns

df = 
A      B         C    D

John   Tom       0    1
Homer  Bart      2    3
Tom    Maggie    1    4 
Lisa   John      5    0

What I have done is the following:

LL1 = pd.concat([df.a,df.b],ignore_index=True)
LL1 = pd.DataFrame(LL1)
LL1.columns=['a']
nameun = pd.unique(LL1.a.ravel())
LLout['c'] = 0
LLout['d'] = 0
NN = list(nameun)
for i in range(1,len(LLout)):
   LLout.c[i] = NN.index(LLout.a[i])
   LLout.d[i] = NN.index(LLout.b[i])

But since I have a very large dataset this process is very slow.

614

asked Oct 22 '15 14:10

emax

1 Answers

Here's one way. First get the array of unique names:

In [11]: df.values.ravel()
Out[11]: array(['John', 'Tom', 'Homer', 'Bart', 'Tom', 'Maggie', 'Lisa', 'John'], dtype=object)

In [12]: pd.unique(df.values.ravel())
Out[12]: array(['John', 'Tom', 'Homer', 'Bart', 'Maggie', 'Lisa'], dtype=object)

and make this a Series, mapping names to their respective numbers:

In [13]: names = pd.unique(df.values.ravel())

In [14]: names = pd.Series(np.arange(len(names)), names)

In [15]: names
Out[15]:
John      0
Tom       1
Homer     2
Bart      3
Maggie    4
Lisa      5
dtype: int64

Now use applymap and names.get to lookup these numbers:

In [16]: df.applymap(names.get)
Out[16]:
   A  B
0  0  1
1  2  3
2  1  4
3  5  0

and assign it to the correct columns:

In [17]: df[["C", "D"]] = df.applymap(names.get)

In [18]: df
Out[18]:
       A       B  C  D
0   John     Tom  0  1
1  Homer    Bart  2  3
2    Tom  Maggie  1  4
3   Lisa    John  5  0

Note: This assumes that all the values are names to begin with, you may want to restrict this to some columns only:

df[['A', 'B']].values.ravel()
...
df[['A', 'B']].applymap(names.get)

answered Sep 22 '22 16:09

Andy Hayden

Related questions
                            
                                What is the difference between these import statements?
                            
                                How do I generate non-repeating random numbers in a while loop? (Python 3)
                            
                                Optimize a numpy ndarray indexing operation
                            
                                flask and flask-socketio flush all emit events
                            
                                Does lexical scope have a dynamic aspect?
                            
                                python asyncio run_forever or while True
                            
                                django,python:AttributeError: 'NoneType' object has no attribute '_meta'
                            
                                Matplotlib Bar Graph Overlapping of Bars
                            
                                Check if object attributes are non-empty python
                            
                                Why Python splits read function into multiple syscalls?
                            
                                Pros and Cons of Python Web Scraping using BeautifulSoup vs XPath [closed]
                            
                                PyOpenGL terribly slow
                            
                                Python (Jinja2) variable inside a variable
                            
                                How to add two Sparse Vectors in Spark using Python
                            
                                Find keys for values that appear more than once
                            
                                Django slug and id as URL redirect
                            
                                How to open and present raw binary data in Python?
                            
                                How to use marshmallow to serialize a custom sqlalchemy field?
                            
                                getting only positive number from a list that containing heterogeneous data type item in python 3
                            
                                Search min value within a list of tuples

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With