I am using Pandas to read a file in this format: <pre class="prettyprint"><code>fp = pandas.read_table("Measurements.txt") fp.head() "Aaron", 3, 5, 7 "Aaron", 3, 6, 9 "Aaron", 3, 6, 10 "Brave", 4, 6, 0 "Brave", 3, 6, 1 </code></pre> I want to replace each name with a unique ID so output looks like: <pre class="prettyprint"><code>"1", 3, 5, 7 "1", 3, 6, 9 "1", 3, 6, 10 "2", 4, 6, 0 "2", 3, 6, 1 </code></pre> How can I do that? Thanks!

I would make use of categorical dtype: <pre class="prettyprint"><code>In [97]: x['ID'] = x.name.astype('category').cat.rename_categories(range(1, x.name.nunique()+1)) In [98]: x Out[98]: name v1 v2 v3 ID 0 Aaron 3 5 7 1 1 Aaron 3 6 9 1 2 Aaron 3 6 10 1 3 Brave 4 6 0 2 4 Brave 3 6 1 2 </code></pre> if you need string IDs instead of numerical ones, you can use: <pre class="prettyprint"><code>x.name.astype('category').cat.rename_categories([str(x) for x in range(1,x.name.nunique()+1)]) </code></pre> or, as @MedAli has mentioned in his answer, using <code>factorize()</code> method - demo: <pre class="prettyprint"><code>In [141]: x['cat'] = pd.Categorical((pd.factorize(x.name)[0] + 1).astype(str)) In [142]: x Out[142]: name v1 v2 v3 ID cat 0 Aaron 3 5 7 1 1 1 Aaron 3 6 9 1 1 2 Aaron 3 6 10 1 1 3 Brave 4 6 0 2 2 4 Brave 3 6 1 2 2 In [143]: x.dtypes Out[143]: name object v1 int64 v2 int64 v3 int64 ID category cat category dtype: object In [144]: x['cat'].cat.categories Out[144]: Index(['1', '2'], dtype='object') </code></pre> or having categories as integer numbers: <pre class="prettyprint"><code>In [154]: x['cat'] = pd.Categorical((pd.factorize(x.name)[0] + 1)) In [155]: x Out[155]: name v1 v2 v3 ID cat 0 Aaron 3 5 7 1 1 1 Aaron 3 6 9 1 1 2 Aaron 3 6 10 1 1 3 Brave 4 6 0 2 2 4 Brave 3 6 1 2 2 In [156]: x['cat'].cat.categories Out[156]: Int64Index([1, 2], dtype='int64') </code></pre> explanation: <pre class="prettyprint"><code>In [99]: x.name.astype('category') Out[99]: 0 Aaron 1 Aaron 2 Aaron 3 Brave 4 Brave Name: name, dtype: category Categories (2, object): [Aaron, Brave] In [100]: x.name.astype('category').cat.categories Out[100]: Index(['Aaron', 'Brave'], dtype='object') In [101]: x.name.astype('category').cat.rename_categories([1,2]) Out[101]: 0 1 1 1 2 1 3 2 4 2 dtype: category Categories (2, int64): [1, 2] </code></pre> explanation for the <code>factorize()</code> method: <pre class="prettyprint"><code>In [157]: (pd.factorize(x.name)[0] + 1) Out[157]: array([1, 1, 1, 2, 2]) In [158]: pd.Categorical((pd.factorize(x.name)[0] + 1)) Out[158]: [1, 1, 1, 2, 2] Categories (2, int64): [1, 2] </code></pre>

How can I replace Python Pandas table text values with unique IDs?

fp = pandas.read_table("Measurements.txt")
fp.head()

"Aaron", 3, 5, 7  
"Aaron", 3, 6, 9  
"Aaron", 3, 6, 10 
"Brave", 4, 6, 0 
"Brave", 3, 6, 1

I want to replace each name with a unique ID so output looks like:

"1", 3, 5, 7 
"1", 3, 6, 9 
"1", 3, 6, 10 
"2", 4, 6, 0 
"2", 3, 6, 1

How can I do that?

Thanks!

855

asked Aug 14 '16 19:08

Kingua

1 Answers

I would make use of categorical dtype:

In [97]: x['ID'] = x.name.astype('category').cat.rename_categories(range(1, x.name.nunique()+1))

In [98]: x
Out[98]:
    name  v1  v2  v3 ID
0  Aaron   3   5   7  1
1  Aaron   3   6   9  1
2  Aaron   3   6  10  1
3  Brave   4   6   0  2
4  Brave   3   6   1  2

if you need string IDs instead of numerical ones, you can use:

x.name.astype('category').cat.rename_categories([str(x) for x in range(1,x.name.nunique()+1)])

or, as @MedAli has mentioned in his answer, using factorize() method - demo:

In [141]: x['cat'] = pd.Categorical((pd.factorize(x.name)[0] + 1).astype(str))

In [142]: x
Out[142]:
    name  v1  v2  v3 ID cat
0  Aaron   3   5   7  1   1
1  Aaron   3   6   9  1   1
2  Aaron   3   6  10  1   1
3  Brave   4   6   0  2   2
4  Brave   3   6   1  2   2

In [143]: x.dtypes
Out[143]:
name      object
v1         int64
v2         int64
v3         int64
ID      category
cat     category
dtype: object

In [144]: x['cat'].cat.categories
Out[144]: Index(['1', '2'], dtype='object')

or having categories as integer numbers:

In [154]: x['cat'] = pd.Categorical((pd.factorize(x.name)[0] + 1))

In [155]: x
Out[155]:
    name  v1  v2  v3 ID cat
0  Aaron   3   5   7  1   1
1  Aaron   3   6   9  1   1
2  Aaron   3   6  10  1   1
3  Brave   4   6   0  2   2
4  Brave   3   6   1  2   2

In [156]: x['cat'].cat.categories
Out[156]: Int64Index([1, 2], dtype='int64')

explanation:

In [99]: x.name.astype('category')
Out[99]:
0    Aaron
1    Aaron
2    Aaron
3    Brave
4    Brave
Name: name, dtype: category
Categories (2, object): [Aaron, Brave]

In [100]: x.name.astype('category').cat.categories
Out[100]: Index(['Aaron', 'Brave'], dtype='object')

In [101]: x.name.astype('category').cat.rename_categories([1,2])
Out[101]:
0    1
1    1
2    1
3    2
4    2
dtype: category
Categories (2, int64): [1, 2]

explanation for the factorize() method:

In [157]: (pd.factorize(x.name)[0] + 1)
Out[157]: array([1, 1, 1, 2, 2])

In [158]: pd.Categorical((pd.factorize(x.name)[0] + 1))
Out[158]:
[1, 1, 1, 2, 2]
Categories (2, int64): [1, 2]

179

answered Sep 21 '22 14:09

MaxU - stop WAR against UA

Related questions
                            
                                Change URL to another URL using mitmproxy
                            
                                matplotlib how to specify time locator's start-ticking timestamp?
                            
                                Serving .json file to download
                            
                                SQLAlchemy func.count on boolean column
                            
                                Pretty Display JSON data from with Flask [duplicate]
                            
                                Google Sheets API "update" method Http Error 400
                            
                                MongoEngine delete document
                            
                                How to round float down to a given precision?
                            
                                python selenium send_keys CONTROL, 'c' not copying actual text
                            
                                Scheduling an asyncio coroutine from another thread
                            
                                How to assign sounds to channels in Pygame?
                            
                                Python Searching Nested Lists
                            
                                Rescaling to (0,1) certain columns from Pandas Python dataframe
                            
                                SMTP AUTH extension not supported by server - Sending emails through a private host
                            
                                Modify namespace of importing script in Python
                            
                                How to replace a function call in an existing method
                            
                                Sort A list of Strings Based on certain field
                            
                                Is it superfluous to declare # -*- coding: utf-8 -*- after #!/usr/bin/python3? [duplicate]
                            
                                Split string into chunks of same letters [duplicate]
                            
                                AttributeError: 'Cycler' object has no attribute 'change_key'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I replace Python Pandas table text values with unique IDs?

Tags:

python

python-3.x

pandas

Kingua

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us