How to represent text for classification in weka?

Tags:

Can you please let me know how to represent attribute or class for text classification in weka. By using what attribute can I do classification? word frequency or just word? What would be possible structure of ARFF format? Can you give me several lines of example of that structure?

Thank you very much in advance.

520

asked Nov 29 '11 15:11

Warren

1 Answers

One of the easiest alternatives is to start with an ARFF file for a two class problem like:

@relation corpus 

@attribute text string
@attribute class {pos,neg}

@data
'long text with words ... ',pos

The text is represented as a String type and the class is a nominal with two values.

Then you could apply two filters:

StringToWordVector that transforms the texts into a word vector representation. The filter uses an attribute for each word. You can tweak parameters to choose binary/frequency representation, stemming or stopwords. The best representation depends on the problem. If text are not long, usually binary representation is enough.
Reorder to move the class atribute to the last position, Weka assumes it is there.

You may find more info and other approaches to transform your data in this Weka wiki page: http://weka.wikispaces.com/Text+categorization+with+WEKA

answered Oct 13 '22 01:10

zdepablo

Related questions
                            
                                How to check efficiently if two characters are neighbours on the keyboard?
                            
                                Are try/catch for every single statement that throws an exception considered an anti-pattern?
                            
                                Efficient way to get indexes of matched items using Lists
                            
                                SWT event propagation
                            
                                What is the best way to reset the database to a known state while testing database operations?
                            
                                HTTP Client with NIO2
                            
                                Android OverlayItem.setMarker(): Change the marker for one item
                            
                                Check if app available on Android Market
                            
                                ANTLR @header, @parser, superClass option and basic file io (Java)
                            
                                Retrieve Spring Security's Authentication, even on public pages with filter="none"
                            
                                Java issue with var-args and boxing
                            
                                Which Java based workflow engine should I use? [closed]
                            
                                What is the advantage of new Lock interface over synchronized block in Java?
                            
                                Monitoring of network traffic
                            
                                Android Actionbar Tabs and Keyboard Focus
                            
                                Regex in java and its performance compared to indexOf
                            
                                Resources For Guava [closed]
                            
                                Java saving/opening File objects
                            
                                Is Spring's @Autowired a huge performance issue?
                            
                                Hibernate - HQL to fetch a collection from Unidirectional OneToMany relationship

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to represent text for classification in weka?

Tags:

java

machine-learning

classification

weka

arff

Warren

People also ask

1 Answers

zdepablo

Recent Activity

Donate For Us