ARFF for natural language processing

Tags:

I'm trying to take a set of reviews, and convert them into the ARFF format for use with WEKA. Unfortunately either I completely misunderstand how the format works, or I'll have to have an attribute for ALL possible words, then a presence indicator. Does anyone know a better way, or ideally have a sample ARFF file?

688

asked May 28 '11 14:05

Dean Barnes

2 Answers

If you store the reviews in plain text files and different folders (positive and negative in your case) you can use TextDirectoryLoader.

You find this in the KnowledgeFlow application in Weka or from the command line. More info here: http://weka.wikispaces.com/ARFF+files+from+Text+Collections

148

answered Nov 16 '22 20:11

zdepablo

Took a while to work out, but with this input.arff:

Click to copy

@relation text_files

@attribute review string
@attribute sentiment {0, 1}

@data
"this is some text", 1
"this is some more text", 1
"different stuff", 0

And this command:

Click to copy

java -classpath "C:\\Program Files\\Weka-3-6\\weka.jar" weka.filters.unsupervised.attribute.StringToWordVector -i input.arff -o output.arff

The following is produced:

Click to copy

@relation 'text_files-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'

@attribute sentiment {0,1}
@attribute different numeric
@attribute is numeric
@attribute more numeric
@attribute some numeric
@attribute stuff numeric
@attribute text numeric
@attribute this numeric

@data

{0 1,2 1,4 1,6 1,7 1}
{0 1,2 1,3 1,4 1,6 1,7 1}
{1 1,5 1}

answered Nov 16 '22 20:11

Dean Barnes

Related questions
                            
                                LDA Topic Model Performance - Topic Coherence Implementation for scikit-learn
                            
                                Text classification beyond the keyword dependency and inferring the actual meaning
                            
                                Is vim able to detect the natural language of a file, then load the correct dictionary?
                            
                                Natural Language parser for parsing sports play-by-play data
                            
                                Which phrase extraction tool is the state of art now?
                            
                                Natural Language Processing - Converting unstructured bibliography to structured metadata
                            
                                What's the difference between WordNet 3.1 and WordNet 3.0?
                            
                                Extract list in api.ai from user input
                            
                                Explain bpe (Byte Pair Encoding) with examples?
                            
                                Updating a BERT model through Huggingface transformers
                            
                                How to identify ideas and concepts in a given text
                            
                                How to make api.ai agent learn something dynamically?
                            
                                Importing external treebank-style BLLIP corpus using NLTK
                            
                                FastText - Cannot load model.bin due to C++ extension failed to allocate the memory
                            
                                Training TFBertForSequenceClassification with custom X and Y data
                            
                                Graph to connect sentences
                            
                                What do the abbreviations in POS tagging etc mean?
                            
                                NLP framework for .NET [closed]
                            
                                Best method to confirm an entity
                            
                                End user tool for generating a regular expression

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ARFF for natural language processing

Tags:

machine-learning

nlp

weka

arff

Dean Barnes

People also ask

2 Answers

zdepablo

Dean Barnes

Recent Activity

Donate For Us