Train and test set are not compatible error in weka?

Tags:

I'm trying to test my model with new dataset. I have done the same preprocessing step as i have done for building my model. I have compared two files but there is no issues. I have all the attributes(train vs test dataset) in same order, same attribute names and data types. But still i'm not able to resolve the issue. Both of the files train and test seems to be similar but the weka explorer is giving me error saying Train and test set are not compatible. How to resolve this error? Is there any way to make test.arff file format as train.arff? Please somebody help me.

Here is the screenshot for file comparision

393

asked Jul 16 '13 09:07

Suren Raju

2 Answers

The same with the comment that I left after problem statement:

All the three attributes are nominal attributes followed by all the possible values quoted by '{}'. One of my guess is that the possible values are not the same. For example, for RESOURCE attribute there is no 199 in test file, while it is in training-file.

answered Oct 15 '22 18:10

Annie Kim

After struggling with the same problem for a day. I figured out two ways to make the trained model working on supplied test set.

Method 1. Use knowledge flow. For example something like below: CSVLoader(for train set) -> classAssigner -> TrainingSetMaker -->(classifier of your choice) -> ClassfierPerformanceEvaluator - TextViewer. CSVLoader(for test set) -> classAssigner -> TestgSetMaker -->(the same classifier instance above) -> PredictionAppender -> CSVSaver. Then load the data from the CSVLoader or arffLoder for the training set. The model will be trained. After that load data from the loader for the test set. It will evaluate the model(classifier, for example) on the supplied test set and you can see the result from the textviewer (connected to the ClassifierPerformanceEvaluator) and get the saved result from the CSVSaver or arffSaver connected to the PredictionAppender.An additional column, the "classfied as" will be added to the output file. In my case, I used "?" for the class column in the supplied test set if the class labels are not available.

Method 2. Combine the Training and Test set into one file. Then the exact same filter can be applied to both training and test set. Then you can separate training set and test set by applying instance filter. Since I use "?" as class label in the test set. It is not visible in the instance filter indices. Hence just select those indices that you can see in the attribute values to be removed when apply the instance filter. You will get the test data left only. Save it and load it in supply test set at the classifier page.This time it will work. I guess it is the class attribute that causes the NOT compatible train and test set issue. As many classfier requires nominal class attribute. The value of which is converted to the index to available values of the class attribute according to http://weka.wikispaces.com/Why+do+I+get+the+error+message+%27training+and+test+set+are+not+compatible%27%3F

answered Oct 15 '22 18:10

d0_0b

Related questions
                            
                                How to get all alpha values of scikit-learn SVM classifier?
                            
                                Placeholder_2:0 is both fed and fetched
                            
                                Adding an extra hidden layer using Google's TensorFlow
                            
                                What does (n,) mean in the context of numpy and vectors?
                            
                                R - Calculate Test MSE given a trained model from a training set and a test set
                            
                                Compute similarity percentage OR Compute correlation between more than 2 objects
                            
                                pytorch Network.parameters() missing 1 required positional argument: 'self'
                            
                                is there any way to get samples under each leaf of a decision tree?
                            
                                TensorFlow average gradients over several batches
                            
                                What to do when Seq2Seq network repeats words over and over in output?
                            
                                Algorithms to find stuff a user would like based on other users likes
                            
                                Algorithm to generate numerical concept hierarchy
                            
                                Periodic Data with Machine Learning (Like Degree Angles -> 179 is 2 different from -179)
                            
                                Text tokenization with Stanford NLP : Filter unrequired words and characters
                            
                                Simple accord.net machine learning example
                            
                                Python - Calculate Hierarchical clustering of word2vec vectors and plot the results as a dendrogram
                            
                                What Type should the dense vector be, when using UDF function in Pyspark? [duplicate]
                            
                                How can I tell which languages are available for text recognition in Apple's Vision framework?
                            
                                Difference between Keras' BatchNormalization and PyTorch's BatchNorm2d?
                            
                                Create Artificial Data in MATLAB

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Train and test set are not compatible error in weka?

Tags:

machine-learning

weka

arff

Suren Raju

People also ask

2 Answers

Annie Kim

d0_0b

Recent Activity

Donate For Us