Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas infer_objects() doesn't convert string columns to numeric

Tags:

python

pandas

I have a data source where all the values are given as strings. When I create a Pandas dataframe from this data, all the columns are naturally of type object. I then want to let Pandas automatically convert any columns that look like numbers into a numeric types (e.g. int64, float64).

Pandas supposedly provides a function to do this automatic type inferencing: pandas.DataFrame.infer_objects(). It's also mentioned in this StackOverflow post. The documentation says:

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

However, the function is not working for me. In the reproducible example below, I have two string columns (value1 and value2) that unambiguously look like int and float values, respectively, but infer_objects() does not convert them from string to the appropriate numeric types.

import pandas as pd

# Create example dataframe.
data = [ ['Alice', '100', '1.1'], ['Bob', '200', '2.1'], ['Carl', '300', '3.1']]
df = pd.DataFrame(data, columns=['name', 'value1', 'value2'])

print(df)

#     name value1 value2
# 0  Alice    100    1.1
# 1    Bob    200    2.1
# 2   Carl    300    3.1

print(df.info())

# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype 
# ---  ------  --------------  ----- 
#  0   name    3 non-null      object
#  1   value1  3 non-null      object
#  2   value2  3 non-null      object
# dtypes: object(3)

df = df.infer_objects() # Should convert value1 and value2 columns to numerics.

print(df.info())

# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype 
# ---  ------  --------------  ----- 
#  0   name    3 non-null      object
#  1   value1  3 non-null      object
#  2   value2  3 non-null      object
# dtypes: object(3)

Any help would be appreciated.

like image 613
stackoverflowuser2010 Avatar asked Mar 14 '26 23:03

stackoverflowuser2010


2 Answers

Or further to @wwnde same solution slightly different,

df["value1"] = pd.to_numeric(df["value1"])
df["value2"] = pd.to_numeric(df["value2"])

EDIT: This is an interesting question and I'm surprised that pandas doesn't convert obvious string floats and integers as you show.

However, this small code can get you through the dataframe and convert your columns.

data = [["Alice", "100", "1.1"], ["Bob", "200", "2.1"], ["Carl", "300", "3.1"]]
df = pd.DataFrame(data, columns=["name", "value1", "value2"])

print(df.info(), "\n")

RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   value1  3 non-null      object
 2   value2  3 non-null      object
dtypes: object(3)

cols = df.columns
for c in cols:
    try:
        df[c] = pd.to_numeric(df[c])
    except:
        pass

print(df.info())

RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   name    3 non-null      object 
 1   value1  3 non-null      int64  
 2   value2  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
like image 98
run-out Avatar answered Mar 17 '26 12:03

run-out


df_new = df.convert_dtypes() may help. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html

like image 25
cp0921 Avatar answered Mar 17 '26 13:03

cp0921