Given is a simple CSV file:
A,B,C Hello,Hi,0 Hola,Bueno,1
Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so:
cols = ['A','B','C'] col_types = {'A': str, 'B': str, 'C': int} test = pd.read_csv('test.csv', dtype=col_types) train_y = test['C'] == 1 train_x = test[cols] clf_rf = RandomForestClassifier(n_estimators=50) clf_rf.fit(train_x, train_y)
But I just get this traceback when invoking fit():
ValueError: could not convert string to float: 'Bueno'
scikit-learn version is 0.16.1.
The Python "ValueError: could not convert string to float" occurs when we pass a string that cannot be converted to a float (e.g. an empty string or one containing characters) to the float() class. To solve the error, remove all unnecessary characters from the string.
If you convert a string object into a floating-point in Python many times you will get a ValueError: could not convert string to float. Usually, this happens if the string object has an invalid floating value with spaces or comma Python will throw ValueError while parsing into string object into float.
We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.
A random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
You have to do some encoding before using fit. As it was told fit() does not accept Strings but you solve this.
There are several classes that can be used :
Personally I have post almost the same question on StackOverflow some time ago. I wanted to have a scalable solution but didn't get any answer. I selected OneHotEncoder that binarize all the strings. It is quite effective but if you have a lot different strings the matrix will grow very quickly and memory will be required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With