Can you please let me know how to represent attribute or class for text classification in weka. By using what attribute can I do classification? word frequency or just word? What would be possible structure of ARFF format? Can you give me several lines of example of that structure?
Thank you very much in advance.
Linear Support Vector Machine is widely regarded as one of the best text classification algorithms.
Text classification also known as text tagging or text categorization is the process of categorizing text into organized groups. By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content.
One of the easiest alternatives is to start with an ARFF file for a two class problem like:
@relation corpus @attribute text string @attribute class {pos,neg} @data 'long text with words ... ',pos
The text is represented as a String type and the class is a nominal with two values.
Then you could apply two filters:
You may find more info and other approaches to transform your data in this Weka wiki page: http://weka.wikispaces.com/Text+categorization+with+WEKA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With