Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace unique values of pandas data-frame

Hi I'm new to python and pandas.

I have extracted the unique values of one of the column using pandas. Now after getting the unique values of the column, which are string.

['Others, Senior Management-Finance, Senior Management-Sales'
  'Consulting, Strategic planning, Senior Management-Finance'
  'Client Servicing, Quality Control - Product/ Process, Strategic       
   planning'
  'Administration/ Facilities, Business Analytics, Client Servicing'
  'Sales & Marketing, Sales/ Business Development/ Account Management,    
  Sales Support']

I want to replace the string values with the unique integer value.

for simplicity I can give you the dummy input and output.

Input:

Col1
  A
  A
  B
  B
  B
  C
  C

Unique df value will come as below

[ 'A' 'B' 'C' ]

after replacing the column should look like this

Col1
  1
  1
  2
  2
  2
  3
  3

Please suggest me the way how can I do it by using loop or any other way because I have more than 300 unique values.

like image 331
JT28 Avatar asked Jun 25 '16 05:06

JT28


People also ask

How do you replace a specific value in a Pandas DataFrame?

DataFrame. replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.

How do you replace a value in a DataFrame with another value?

replace() method. It is used to replace a regex, string, list, series, number, dictionary, etc. from a DataFrame, Values of the DataFrame method are get replaced with another value dynamically.

How do I change values in a Pandas DataFrame column based on a condition in python?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


1 Answers

Use factorize:

df['Col1'] = pd.factorize(df.Col1)[0] + 1
print (df)
   Col1
0     1
1     1
2     2
3     2
4     2
5     3
6     3

Factorizing values.

Another numpy.unique solution, but slowier in huge dataframe:

_,idx = np.unique(df['Col1'],return_inverse=True) 
df['Col1'] = idx + 1
print (df)
   Col1
0     1
1     1
2     2
3     2
4     2
5     3
6     3

Last you can convert values to categorical - mainly because less memory usage:

df['Col1'] = pd.factorize(df.Col1)[0]
df['Col1'] = df['Col1'].astype("category")
print (df)
  Col1
0    0
1    0
2    1
3    1
4    1
5    2
6    2

print (df.dtypes)
Col1    category
dtype: object
like image 192
jezrael Avatar answered Sep 29 '22 11:09

jezrael