I created a pandas dataframe from a list of lists
import pandas as pd
df_list = [["a", "1", "2"], ["b", "3", np.nan]]
df = pd.DataFrame(df_list, columns = list("ABC"))
>>> A B C
0 a 1 2
1 b 3 NaN
Is there a way to convert all columns of the dataframe to float, that can be converted, i.e. B and C? The following works, if you know, which columns to convert:
df[["B", "C"]] = df[["B", "C"]].astype("float")
But what do you do, if you don't know in advance, which columns contain the numbers? When I tried
df = df.astype("float", errors = "ignore")
all columns are still strings/objects. Similarly,
df[["B", "C"]] = df[["B", "C"]].apply(pd.to_numeric)
converts both columns (though "B" is int
and "C" is "float", because of the NaN
value being present), but
df = df.apply(pd.to_numeric)
obviously throws an error message and I don't see a way to suppress this.
Is there a possibility to perform this string-float conversion without looping through each column, to try .astype("float", errors = "ignore")
?
Change column type in pandas using DataFrame.apply() to_numeric, pandas. to_datetime, and pandas. to_timedelta as arguments to apply the apply() function to change the data type of one or more columns to numeric, DateTime, and time delta respectively.
To convert the column type to float in Pandas DataFrame: use the Series' astype() method. use Pandas' to_numeric() method.
We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.
The Python "ValueError: could not convert string to float" occurs when we pass a string that cannot be converted to a float (e.g. an empty string or one containing characters) to the float() class. To solve the error, remove all unnecessary characters from the string.
I think you need parameter errors='ignore'
in to_numeric
:
df = df.apply(pd.to_numeric, errors='ignore')
print (df.dtypes)
A object
B int64
C float64
dtype: object
It working nice if not mixed values - numeric with strings:
df_list = [["a", "t", "2"], ["b", "3", np.nan]]
df = pd.DataFrame(df_list, columns = list("ABC"))
df = df.apply(pd.to_numeric, errors='ignore')
print (df)
A B C
0 a t 2.0 <=added t to column B for mixed values
1 b 3 NaN
print (df.dtypes)
A object
B object
C float64
dtype: object
EDIT:
You can downcast also int
to float
s:
df = df.apply(pd.to_numeric, errors='ignore', downcast='float')
print (df.dtypes)
A object
B float32
C float32
dtype: object
It is same as:
df = df.apply(lambda x: pd.to_numeric(x, errors='ignore', downcast='float'))
print (df.dtypes)
A object
B float32
C float32
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With