I have a pandas dataframe, all the values are strings. Some are 'None's, and the rest are integers but in string format, such as '123456'. How can I convert all 'None's to np.nan, and others to integers, like, 123456.
df = {'col1': ['1', 'None'], 'col2': ['None', '123']}
Convert df to:
df = {'col1': [1, NaN], 'col2': [NaN, 123]}
                No, NaN is a floating point value. Every possible value of an int is a number.
Use the below code:
print(df.replace('None', np.nan).astype(float))
Output:
   col1   col2
0   1.0    NaN
1   NaN  123.0
You have to use replace.
P.S. if df is a dictionary, convert it first:
df = pd.DataFrame(df)
                        You can convert your columns to Nullable Integer type (new in 0.24+):
d = {'col1': ['1', 'None'], 'col2': ['None', '123']}
res = pd.DataFrame({
    k: pd.to_numeric(v, errors='coerce') for k, v in d.items()}, dtype='Int32')
res
   col1  col2
0     1   NaN
1   NaN   123
With this solution, numeric data is converted to integers (but missing data remains as NaN):
res.to_dict()
# {'col1': [1, nan], 'col2': [nan, 123]}
On older versions, convert to object when initialising the DataFrame:
res = pd.DataFrame({
    k: pd.to_numeric(v, errors='coerce') for k, v in d.items()}, dtype=object)
res
  col1 col2
0    1  NaN
1  NaN  123
It is different from the nullable types solution above—only the representation changes, not the actual data.
res.to_dict()
#  {'col1': [1.0, nan], 'col2': [nan, 123.0]}
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With