Hi I'm new to python and pandas.
I have extracted the unique values of one of the column using pandas. Now after getting the unique values of the column, which are string.
['Others, Senior Management-Finance, Senior Management-Sales'
'Consulting, Strategic planning, Senior Management-Finance'
'Client Servicing, Quality Control - Product/ Process, Strategic
planning'
'Administration/ Facilities, Business Analytics, Client Servicing'
'Sales & Marketing, Sales/ Business Development/ Account Management,
Sales Support']
I want to replace the string values with the unique integer value.
for simplicity I can give you the dummy input and output.
Input:
Col1
A
A
B
B
B
C
C
Unique df value will come as below
[ 'A' 'B' 'C' ]
after replacing the column should look like this
Col1
1
1
2
2
2
3
3
Please suggest me the way how can I do it by using loop or any other way because I have more than 300
unique values.
DataFrame. replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.
replace() method. It is used to replace a regex, string, list, series, number, dictionary, etc. from a DataFrame, Values of the DataFrame method are get replaced with another value dynamically.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Use factorize
:
df['Col1'] = pd.factorize(df.Col1)[0] + 1
print (df)
Col1
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Factorizing values.
Another numpy.unique
solution, but slowier in huge dataframe:
_,idx = np.unique(df['Col1'],return_inverse=True)
df['Col1'] = idx + 1
print (df)
Col1
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Last you can convert values to categorical
- mainly because less memory usage:
df['Col1'] = pd.factorize(df.Col1)[0]
df['Col1'] = df['Col1'].astype("category")
print (df)
Col1
0 0
1 0
2 1
3 1
4 1
5 2
6 2
print (df.dtypes)
Col1 category
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With