I have a data source where all the values are given as strings. When I create a Pandas dataframe from this data, all the columns are naturally of type object. I then want to let Pandas automatically convert any columns that look like numbers into a numeric types (e.g. int64, float64).
Pandas supposedly provides a function to do this automatic type inferencing: pandas.DataFrame.infer_objects(). It's also mentioned in this StackOverflow post. The documentation says:
Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.
However, the function is not working for me. In the reproducible example below, I have two string columns (value1 and value2) that unambiguously look like int and float values, respectively, but infer_objects() does not convert them from string to the appropriate numeric types.
import pandas as pd
# Create example dataframe.
data = [ ['Alice', '100', '1.1'], ['Bob', '200', '2.1'], ['Carl', '300', '3.1']]
df = pd.DataFrame(data, columns=['name', 'value1', 'value2'])
print(df)
# name value1 value2
# 0 Alice 100 1.1
# 1 Bob 200 2.1
# 2 Carl 300 3.1
print(df.info())
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 name 3 non-null object
# 1 value1 3 non-null object
# 2 value2 3 non-null object
# dtypes: object(3)
df = df.infer_objects() # Should convert value1 and value2 columns to numerics.
print(df.info())
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 name 3 non-null object
# 1 value1 3 non-null object
# 2 value2 3 non-null object
# dtypes: object(3)
Any help would be appreciated.
Or further to @wwnde same solution slightly different,
df["value1"] = pd.to_numeric(df["value1"])
df["value2"] = pd.to_numeric(df["value2"])
EDIT: This is an interesting question and I'm surprised that pandas doesn't convert obvious string floats and integers as you show.
However, this small code can get you through the dataframe and convert your columns.
data = [["Alice", "100", "1.1"], ["Bob", "200", "2.1"], ["Carl", "300", "3.1"]]
df = pd.DataFrame(data, columns=["name", "value1", "value2"])
print(df.info(), "\n")
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 value1 3 non-null object
2 value2 3 non-null object
dtypes: object(3)
cols = df.columns
for c in cols:
try:
df[c] = pd.to_numeric(df[c])
except:
pass
print(df.info())
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 value1 3 non-null int64
2 value2 3 non-null float64
dtypes: float64(1), int64(1), object(1)
df_new = df.convert_dtypes() may help. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With