When I try to get the mean of one of my data frame's columns it shows the error:
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Here is the code I have:
import pandas as pd
import numpy as np
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
df = pd.read_csv(url, header = None, )
headers = ["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style","drive-wheels","engine-location","wheel-base","lenght","width","height","curb-weight","engine-type","num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm","city-mpg","highway-mpg","price"]
df.columns = headers
df.replace('?',np.nan, inplace=True)
mean_val = df['normalized-losses'].mean()
print(mean_val)
The Python "TypeError: unsupported operand type(s) for -: 'str' and 'str'" occurs when we try to use the subtraction - operator with two strings. To solve the error, convert the strings to int or float values, e.g. int(my_str_1) - int(my_str_2) .
The Python "TypeError: unsupported operand type(s) for /: 'str' and 'int'" occurs when we try to use the division / operator with a string and a number. To solve the error, convert the string to an int or a float , e.g. int(my_str) / my_num .
The Python "TypeError: unsupported operand type(s) for +: 'int' and 'str'" occurs when we try to use the addition (+) operator with an integer and a string. To solve the error, convert the string to an integer, e.g. my_int + int(my_str) .
The TypeError: unsupported operand type(s) for +: 'int' and 'str' error occurs when an integer value is added with a string that could contain a valid integer value. Python does not support auto casting. You can add an integer number with a different number. You can't add an integer with a string in Python.
You need to convert the column data type to numeric with pd.to_numeric()
. If you use the option errors='coerce'
then it will automatically replace non-numeric characters with NaN
.
mean_val = pd.to_numeric(df['normalized-losses'], errors='coerce').mean()
print(mean_val)
> 122.0
Adding onto Nathaniel's answer, you have a mix of float
and str
. You can see this if you
print(df['normalized-losses'].apply(type))
Which will return
0 <class 'float'>
1 <class 'float'>
2 <class 'float'>
3 <class 'str'>
4 <class 'str'>
As your error message says, you need to make all of your data of the float
type. You can either use pd.to_numeric
as Nathaniel suggested or you can alternatively use
df['normalized-losses'] = df['normalized-losses'].astype('float')
mean_val = df['normalized-losses'].mean()
print(mean_val)
Output
122.0
If you are only interested in the normalized-losses column and know that all of your strings can be converted properly (in this case, I believe they can since they are all strings of numbers such as ‘130’), you could just do this. If you are going to use the rest of the data and want to have all numeric strings converted, then use Nathaniel's implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With