I am using an e-commerce dataset to predict product categories. I use the product description and supplier code as features, and predict the product category.
from sklearn import preprocessing
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import ensemble
df['joined_features'] = df['description'].astype(str) + ' ' + df['supplier'].astype(str)
# split the dataset into training and validation datasets
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(df['joined_features'], df['category'])
# encode target variable
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)
# count vectorizer object
count_vect = CountVectorizer(analyzer='word')
count_vect.fit(df['joined_features'])
# transform training and validation data
xtrain_count = count_vect.transform(train_x)
xvalid_count = count_vect.transform(valid_x)
classifier = ensemble.RandomForestClassifier()
classifier.fit(xtrain_count, train_y)
predictions = classifier.predict(feature_vector_valid)
I get ~90% accuracy with this prediction. I now want to predict more categories. These categories are hierarchical. The category I predicted was the main one. I want to predict a couple more.
As an example, I predicted clothing. Now I want to predict: Clothing -> Shoes
I tried joining both categories: df['category1'] + df['category2'] and predicting them as one, but I get around 2% accuracy, which is really low.
What is the proper way to make a classifier in a hierarchical fashion?
Edit: I compiled some fake data for a better understanding:

From the first row: category 1 corresponds to Samsung, category 3 to electronics, and category 7 to TVs.
One idea might be to build a model using all of your level 2 categories, but feed the prediction probabilities for category1 into the model as an input feature.
Another idea is that you train a model for category2 only for category1==Clothing. Ideally you'd have other multiclass models to be conditionally called depending on what the prediction from category1 was. Obviously this increases the amount of work you'd have to do depending on how many category1's there are.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With