Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RandomForestClassfier.fit(): ValueError: could not convert string to float

Given is a simple CSV file:

A,B,C Hello,Hi,0 Hola,Bueno,1 

Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so:

cols = ['A','B','C'] col_types = {'A': str, 'B': str, 'C': int} test = pd.read_csv('test.csv', dtype=col_types)  train_y = test['C'] == 1 train_x = test[cols]  clf_rf = RandomForestClassifier(n_estimators=50) clf_rf.fit(train_x, train_y) 

But I just get this traceback when invoking fit():

ValueError: could not convert string to float: 'Bueno' 

scikit-learn version is 0.16.1.

like image 755
nilkn Avatar asked May 21 '15 21:05

nilkn


People also ask

How do you fix ValueError could not convert string to float?

The Python "ValueError: could not convert string to float" occurs when we pass a string that cannot be converted to a float (e.g. an empty string or one containing characters) to the float() class. To solve the error, remove all unnecessary characters from the string.

Why can Python not convert string to float?

If you convert a string object into a floating-point in Python many times you will get a ValueError: could not convert string to float. Usually, this happens if the string object has an invalid floating value with spaces or comma Python will throw ValueError while parsing into string object into float.

How do you convert a string to a float?

We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.

What is Randomforestclassifier in Python?

A random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.


1 Answers

You have to do some encoding before using fit. As it was told fit() does not accept Strings but you solve this.

There are several classes that can be used :

  • LabelEncoder : turn your string into incremental value
  • OneHotEncoder : use One-of-K algorithm to transform your String into integer

Personally I have post almost the same question on StackOverflow some time ago. I wanted to have a scalable solution but didn't get any answer. I selected OneHotEncoder that binarize all the strings. It is quite effective but if you have a lot different strings the matrix will grow very quickly and memory will be required.

like image 128
RPresle Avatar answered Sep 28 '22 12:09

RPresle