Possible Duplicate:
Text Classification into Categories
I am currently working on a solution to get the type of food served in a database with 10k restaurants based on their description. I'm using lists of keywords to decide which kind of food is being served.
I read a little bit about machine learning but I have no practical experience with it at all. Can anyone explain to me if/why it would a be better solution to a simple problem like this? I find accuracy more important than performance!
simplified example:
["China", "Chinese", "Rice", "Noodles", "Soybeans"]
["Belgium", "Belgian", "Fries", "Waffles", "Waterzooi"]
a possible description could be:
"Hong's Garden Restaurant offers savory, reasonably priced Chinese to our customers. If you find that you have a sudden craving for rice, noodles or soybeans at 8 o’clock on a Saturday evening, don’t worry! We’re open seven days a week and offer carryout service. You can get fries here as well!"
Go-to Guide for Text Classification with Machine Learning Text classification is a machine learning technique that automatically assigns tags or categories to text. Using natural language processing (NLP), text classifiers can analyze and sort text by sentiment, topic, and customer intent – faster and more accurately than humans.
This is where machine learning and text classification come into play. Companies may use text classifiers to quickly and cost-effectively arrange all types of relevant content, including emails, legal documents, social media, chatbots, surveys, and more.
The supervised algorithm learns the mapping function from the input to the output. When you have new data, this technique should be able to predict the variables for that data. Text classification uses supervised machine learning and has various applications, including ticket routing.
The idea is to create, analyze and report information fast. This is when automated text classification steps up. Text classification is a smart classification of text into categories. And, using machine learning to automate these tasks, just makes the whole process super-fast and efficient.
You are indeed describing a classification problem, which can be solved with machine learning.
In this problem, your features are the words in the description. You should use the Bag Of Words model - which basically says that the words and their number of occurrences for each word is what matters to the classification process.
To solve your problem, here are the steps you should do:
Evaluation:
Evaluation of your algorithm can be done with cross-validation, or seperating a test set out of your labeled examples that will be used only for evaluating how accurate the algorithm is.
Optimizations:
From personal experience - here are some optimizations I found helpful for the feature extraction:
Libraries:
Unfortunately, I am not fluent enough with python, but here are some libraries that might be helpful:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With