I have a pandas dataframe, all the values are strings. Some are 'None's, and the rest are integers but in string format, such as '123456'. How can I convert all 'None's to np.nan, and others to integers, like, 123456.
df = {'col1': ['1', 'None'], 'col2': ['None', '123']}
Convert df to:
df = {'col1': [1, NaN], 'col2': [NaN, 123]}
No, NaN is a floating point value. Every possible value of an int is a number.
Use the below code:
print(df.replace('None', np.nan).astype(float))
Output:
col1 col2
0 1.0 NaN
1 NaN 123.0
You have to use replace
.
P.S. if df
is a dictionary, convert it first:
df = pd.DataFrame(df)
You can convert your columns to Nullable Integer type (new in 0.24+):
d = {'col1': ['1', 'None'], 'col2': ['None', '123']}
res = pd.DataFrame({
k: pd.to_numeric(v, errors='coerce') for k, v in d.items()}, dtype='Int32')
res
col1 col2
0 1 NaN
1 NaN 123
With this solution, numeric data is converted to integers (but missing data remains as NaN):
res.to_dict()
# {'col1': [1, nan], 'col2': [nan, 123]}
On older versions, convert to object
when initialising the DataFrame:
res = pd.DataFrame({
k: pd.to_numeric(v, errors='coerce') for k, v in d.items()}, dtype=object)
res
col1 col2
0 1 NaN
1 NaN 123
It is different from the nullable types solution above—only the representation changes, not the actual data.
res.to_dict()
# {'col1': [1.0, nan], 'col2': [nan, 123.0]}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With