I am trying to replace certain strings in a column in pandas, but am getting NaN
for some rows. The column is an object datatype.
I want all rows with 'n'
in the string replaced with 'N'
and and all rows with 's'
in the string replaced with 'S'
. In other words, I am trying to capitalize the string when it appears.
However, I am am getting NaN
values for rows without 'n'
or 's'
in the string. How can I replace 'n'
and 's'
without getting NaN
for the other values?
Here is the head of my dataframe:
data_frame['column_name'].head(10)
0 1n
1 1n
2 1n
3 1n
4 2n
5 2s
6 3
7 3
8 4s
9 4s
After replacing, the string '3'
is now NaN
:
data_frame['column_name'] = data_frame['column_name'].str.replace('n', 'N')
data_frame['column_name'] = data_frame['column_name'].str.replace('s', 'S')
data_frame['column_name'].head(10)
Out[87]:
0 1N
1 1N
2 1N
3 1N
4 2N
5 2S
6 NaN
7 NaN
8 4S
9 4S
Name: NCU, dtype: object
Please let me know if I can add more information.
Simpliest solution is cast column to string
- then is possible use str.upper
or str.replace
:
data_frame['column_name'] = data_frame['column_name'].astype(str)
data_frame['column_name'] = data_frame['column_name'].str.replace('n', 'N')
data_frame['column_name'] = data_frame['column_name'].str.replace('s', 'S')
print (data_frame)
column_name
0 1N
1 1N
2 1N
3 1N
4 2N
5 2S
6 3
7 3
8 4S
9 4S
But if need numeric with strings together:
I think you need Series.replace
, because you have mixed values - numeric with strings and str.replace
return NaN
where numeric values (bur works another solution with mask
):
data_frame['column_name'] = data_frame['column_name'].replace(['n', 's'],
['S','N'],
regex=True)
print (data_frame)
column_name
0 1S
1 1S
2 1S
3 1S
4 2S
5 2N
6 3
7 3
8 4N
9 4N
Another solution is filter only string
and use Series.mask
with str.upper
:
mask = data_frame['column_name'].apply(type) == str
data_frame['column_name'] = data_frame['column_name'].mask(mask,
data_frame['column_name'].str.upper())
print (data_frame)
column_name
0 1N
1 1N
2 1N
3 1N
4 2N
5 2S
6 3
7 3
8 4S
9 4S
Another solution is replace NaN
by combine_first
or fillna
:
upper = data_frame['column_name'].str.upper()
data_frame['column_name'] = upper.combine_first(data_frame['column_name'])
#alternative solution
#data_frame['column_name'] = upper.fillna(data_frame['column_name'])
column_name
0 1N
1 1N
2 1N
3 1N
4 2N
5 2S
6 3
7 3
8 4S
9 4S
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With