I'm trying to run code provided by yhat in their article about random forests in Python, but I keep getting following error message:
File "test_iris_with_rf.py", line 11, in <module>
df['species'] = pd.Factor(iris.target, iris.target_names)
AttributeError: 'module' object has no attribute 'Factor'
Code:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
print df
print iris.target_names
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Factor(iris.target, iris.target_names)
df.head()
The most likely cause of the error is having a local file named pandas.py which shadows the official pandas module. Make sure you haven't misspelled DataFrame as class names are case-sensitive. Make sure to rename your local file to something other than pandas.py to solve the error.
If you try to call concat() on a DataFrame object, you will raise the AttributeError: 'DataFrame' object has no attribute 'concat'. You have to pass the columns to concatenate to pandas. concat() and define the axis to concatenate along.
The short answer is yes, there is a size limit for pandas DataFrames, but it's so large you will likely never have to worry about it. The long answer is the size limit for pandas DataFrames is 100 gigabytes (GB) of memory instead of a set number of cells.
In newer versions of pandas, the Factor
is called Categorical
instead. Change your line to:
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With