I have a ton of short stories about 500 words long and I want to categorize them into one of, let's say, 20 categories:
I can hand-classify a bunch of them, but I want to implement machine learning to guess the categories eventually. What's the best way to approach this? Is there a standard approach to machine learning I should be using? I don't think a decision tree would work well since it's text data...I'm completely new in this field.
Any help would be appreciated, thanks!
Rule-based approaches classify text into organized groups by using a set of handcrafted linguistic rules. These rules instruct the system to use semantically relevant elements of a text to identify relevant categories based on its content. Each rule consists of an antecedent or pattern and a predicted category.
Support vector machine (SVM) is a widely used text classification method. It is a machine learning method based on statistical learning theory. It was first proposed for binary classification problems.
Linear Support Vector Machine is widely regarded as one of the best text classification algorithms.
Text classification is a machine learning technique that automatically assigns tags or categories to text. Using natural language processing (NLP), text classifiers can analyze and sort text by sentiment, topic, and customer intent – faster and more accurately than humans.
A naive Bayes will most probably work for you. The method is like this:
Training:
Decision:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With