How to add more features in multi text classification?

Tags:

I have a retail dataset with product_description, price, supplier, category as columns. I used product_description as feature:

from sklearn import model_selection, preprocessing, naive_bayes

# split the dataset into training and validation datasets 
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(df['product_description'], df['category'])

# label encode the target variable 
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)

tfidf_vect = TfidfVectorizer(analyzer='word', token_pattern=r'\w{1,}', max_features=5000)
tfidf_vect.fit(df['product_description'])
xtrain_tfidf =  tfidf_vect.transform(train_x)
xvalid_tfidf =  tfidf_vect.transform(valid_x)

classifier = naive_bayes.MultinomialNB().fit(xtrain_tfidf, train_y)

# predict the labels on validation dataset
predictions = classifier.predict(xvalid_tfidf)
metrics.accuracy_score(predictions, valid_y) # ~20%, very low

Since the accuracy is very low, I want to add the supplier and price as features too. How can I incorporate this in the code?

I have tried other classifiers like LR, SVM, and Random Forrest, but they had (almost) the same outcome.

207

asked Aug 10 '20 09:08

Snow

Video Answer

1 Answers

The TF-IDF vectorizer returns a matrix: one row per example with the scores. You can modify this matrix as you wish before feeding it into the classifier.

Prepare your additional features as a NumPy array of shape: number of examples × number of features.
Use np.concatenate with axis=1.
Fit the classifier as you did before.

It is usually a good idea to normalize real-valued features. Also, you can try different classifiers: Logistic Regression or SVM might do a better job for real-valued features than Naive Bayes.

183

answered Oct 10 '22 08:10

Jindřich

Related questions
                            
                                Python 3 - ValueError: Found array with 0 sample(s) (shape=(0, 11)) while a minimum of 1 is required by MinMaxScaler
                            
                                Use generic in type alias
                            
                                Create a new column based on previous row value and delete the current row
                            
                                Connect to HDFS with Kerberos Authentication using Python
                            
                                How to unit test using patch with a side effect with Pytest parametrize?
                            
                                Install Multiple Package using Sagemaker Life Cycle configuration file
                            
                                "Module not found" when importing a Python package within a plpython3u procedure
                            
                                pandas merge_asof: ambiguous argument types error
                            
                                I happen to stumble upon this code :" With for w in words:, the example would attempt to create an infinite list
                            
                                Python hangs for too long on with open
                            
                                Implementing code to start multiple threads on top of multiple process in python
                            
                                py4JJava Error - error while using select statement
                            
                                Pywinauto timings waiting 0.5 seconds instead of immediate
                            
                                Running pre-commit hooks (e.g. pylint) when developing with docker
                            
                                How to override the default browser selection in Windows 7 when opening webppages with Python
                            
                                Dockerfile fails when installing a python local package with permission denied
                            
                                Tensorflow graph nodes are exchange
                            
                                Problem with creating an environment from .yml file, error "CondaEnvException: Pip failed" raised
                            
                                unable to access environment variables from docker compose env file
                            
                                Unit Test Retry functionality provided by Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to add more features in multi text classification?

Tags:

python-3.x

supervised-learning

scikit-learn

text-classification

Snow

People also ask

Video Answer

1 Answers

Jindřich

Recent Activity

Donate For Us