Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change datatype of multiple columns in pandas

I'm trying to run a Random Forest on a pandas dataframe. I know there are no nulls or infinities in the dataframe but continually get a ValueError when I fit the model. Presumably this is because I have flaot64 columns rather than float32; I also have a lot of columns of type bool and int. Is there a way to change all the float columns to float32?

I've tried rewriting the CSV and am relatively certain the problem isn't with that. I've never had problems running random forests on float64s before so I'm not sure what's going wrong this time.

labels = electric['electric_ratio']
electric = electric[[x for x in electric.columns if x != 'electric_ratio']]
electric_list = electric.columns
first_train, first_test, train_labels, test_labels = train_test_split(electric, labels)
rf = RandomForestRegressor(n_estimators = 1000, random_state=88)
rf_1 = rf.fit(first_train, train_labels)

I expect this to fit the model, but instead consistently get

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
like image 592
MK. Avatar asked Sep 02 '25 01:09

MK.


1 Answers

You can use df.astype() with a dictionary for the columns you want to change with the corresponding dtype.

df = df.astype({'col1': 'object', 'col2': 'int'})
like image 55
Zakariya Avatar answered Sep 04 '25 14:09

Zakariya