Replace unique values of pandas data-frame

Tags:

Hi I'm new to python and pandas.

I have extracted the unique values of one of the column using pandas. Now after getting the unique values of the column, which are string.

['Others, Senior Management-Finance, Senior Management-Sales'
  'Consulting, Strategic planning, Senior Management-Finance'
  'Client Servicing, Quality Control - Product/ Process, Strategic       
   planning'
  'Administration/ Facilities, Business Analytics, Client Servicing'
  'Sales & Marketing, Sales/ Business Development/ Account Management,    
  Sales Support']

I want to replace the string values with the unique integer value.

for simplicity I can give you the dummy input and output.

Input:

Col1
  A
  A
  B
  B
  B
  C
  C

Unique df value will come as below

[ 'A' 'B' 'C' ]

after replacing the column should look like this

Please suggest me the way how can I do it by using loop or any other way because I have more than 300 unique values.

331

asked Jun 25 '16 05:06

JT28

1 Answers

Use factorize:

df['Col1'] = pd.factorize(df.Col1)[0] + 1
print (df)
   Col1
0     1
1     1
2     2
3     2
4     2
5     3
6     3

Factorizing values.

Another numpy.unique solution, but slowier in huge dataframe:

_,idx = np.unique(df['Col1'],return_inverse=True) 
df['Col1'] = idx + 1
print (df)
   Col1
0     1
1     1
2     2
3     2
4     2
5     3
6     3

Last you can convert values to categorical - mainly because less memory usage:

df['Col1'] = pd.factorize(df.Col1)[0]
df['Col1'] = df['Col1'].astype("category")
print (df)
  Col1
0    0
1    0
2    1
3    1
4    1
5    2
6    2

print (df.dtypes)
Col1    category
dtype: object

192

answered Sep 29 '22 11:09

jezrael

Related questions
                            
                                Why does my LRU cache miss with the same argument?
                            
                                NetworkX shuffles nodes order
                            
                                How to design an async pipeline pattern in python
                            
                                HDF5 possible data corruption or loss?
                            
                                SciPy Curve Fit Fails Power Law
                            
                                Anaconda not updating to latest
                            
                                Why does `subprocess.check_call(..., stderr=sys.stdout)` fail in Python 2.6?
                            
                                Stress attribute -- sklearn.manifold.MDS / Python
                            
                                How to send print job to printer in python
                            
                                Calculating distance between *multiple* sets of geo coordinates in python
                            
                                Calling Parent Variables into List
                            
                                How to change and reload python code in waitress without restarting the server?
                            
                                Multi-dimension dictionary in configparser
                            
                                How do prevent pip and easy_install from removing the temporary directories?
                            
                                Location for configuration in a virtualenv
                            
                                Efficient cython file reading, string parsing, and array building
                            
                                Python server "Aborted (Core dumped)"
                            
                                Trouble with relative / absolute functions import in scikit-image
                            
                                Deserializing a huge json string to python objects
                            
                                Sum of Squares - np.inner vs squaring first, then summing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replace unique values of pandas data-frame

Tags:

python

replace

pandas

dataframe

categories

JT28

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us