How to change datatype of multiple columns in pandas

Question

I'm trying to run a Random Forest on a pandas dataframe. I know there are no nulls or infinities in the dataframe but continually get a ValueError when I fit the model. Presumably this is because I have flaot64 columns rather than float32; I also have a lot of columns of type bool and int. Is there a way to change all the float columns to float32?

I've tried rewriting the CSV and am relatively certain the problem isn't with that. I've never had problems running random forests on float64s before so I'm not sure what's going wrong this time.

labels = electric['electric_ratio']
electric = electric[[x for x in electric.columns if x != 'electric_ratio']]
electric_list = electric.columns
first_train, first_test, train_labels, test_labels = train_test_split(electric, labels)
rf = RandomForestRegressor(n_estimators = 1000, random_state=88)
rf_1 = rf.fit(first_train, train_labels)

I expect this to fit the model, but instead consistently get

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

Zakariya · Accepted Answer

You can use df.astype() with a dictionary for the columns you want to change with the corresponding dtype.

df = df.astype({'col1': 'object', 'col2': 'int'})

How to change datatype of multiple columns in pandas

Tags:

python

pandas

jupyter-notebook

machine-learning

random-forest

MK.

1 Answers

Zakariya

Recent Activity

Donate For Us

How to change datatype of multiple columns in pandas

Tags:

python

pandas

jupyter-notebook

machine-learning

random-forest

MK.

1 Answers

Zakariya

Related questions

Recent Activity

Donate For Us