Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

str.replace function creating NaN data

Tags:

pandas

I am trying to replace certain strings in a column in pandas, but am getting NaN for some rows. The column is an object datatype.

I want all rows with 'n' in the string replaced with 'N' and and all rows with 's' in the string replaced with 'S'. In other words, I am trying to capitalize the string when it appears.

However, I am am getting NaN values for rows without 'n' or 's' in the string. How can I replace 'n' and 's' without getting NaN for the other values?

Here is the head of my dataframe:

data_frame['column_name'].head(10)
0      1n
1      1n
2      1n
3      1n
4      2n
5      2s
6       3
7       3
8      4s
9      4s

After replacing, the string '3' is now NaN:

data_frame['column_name'] = data_frame['column_name'].str.replace('n', 'N')
data_frame['column_name'] = data_frame['column_name'].str.replace('s', 'S')

data_frame['column_name'].head(10)
Out[87]: 
0     1N
1     1N
2     1N
3     1N
4     2N
5     2S
6    NaN
7    NaN
8     4S
9     4S
Name: NCU, dtype: object

Please let me know if I can add more information.

like image 419
Scott Davis Avatar asked May 08 '17 19:05

Scott Davis


Video Answer


1 Answers

Simpliest solution is cast column to string - then is possible use str.upper or str.replace:

data_frame['column_name'] = data_frame['column_name'].astype(str)
data_frame['column_name'] = data_frame['column_name'].str.replace('n', 'N')
data_frame['column_name'] = data_frame['column_name'].str.replace('s', 'S')
print (data_frame)
  column_name
0          1N
1          1N
2          1N
3          1N
4          2N
5          2S
6           3
7           3
8          4S
9          4S

But if need numeric with strings together:

I think you need Series.replace, because you have mixed values - numeric with strings and str.replace return NaN where numeric values (bur works another solution with mask):

data_frame['column_name'] = data_frame['column_name'].replace(['n', 's'],
                                                              ['S','N'],
                                                              regex=True)
print (data_frame)
  column_name
0          1S
1          1S
2          1S
3          1S
4          2S
5          2N
6           3
7           3
8          4N
9          4N

Another solution is filter only string and use Series.mask with str.upper:

mask = data_frame['column_name'].apply(type) == str
data_frame['column_name'] = data_frame['column_name'].mask(mask,
                            data_frame['column_name'].str.upper())
print (data_frame)
  column_name
0          1N
1          1N
2          1N
3          1N
4          2N
5          2S
6           3
7           3
8          4S
9          4S

Another solution is replace NaN by combine_first or fillna:

upper = data_frame['column_name'].str.upper()
data_frame['column_name'] = upper.combine_first(data_frame['column_name'])
#alternative solution
#data_frame['column_name'] = upper.fillna(data_frame['column_name'])
  column_name
0          1N
1          1N
2          1N
3          1N
4          2N
5          2S
6           3
7           3
8          4S
9          4S
like image 198
jezrael Avatar answered Oct 03 '22 20:10

jezrael