Is there any package to load .arff format file into matlab? The .arff format is used in Weka for running machine learning algorithm.
Since Weka is a Java library, you can directly use the API it exposes to read ARFF files:
%## paths
WEKA_HOME = 'C:\Program Files\Weka-3-7';
javaaddpath([WEKA_HOME '\weka.jar']);
fName = [WEKA_HOME '\data\iris.arff'];
%## read file
loader = weka.core.converters.ArffLoader();
loader.setFile( java.io.File(fName) );
D = loader.getDataSet();
D.setClassIndex( D.numAttributes()-1 );
%## dataset
relationName = char(D.relationName);
numAttr = D.numAttributes;
numInst = D.numInstances;
%## attributes
%# attribute names
attributeNames = arrayfun(@(k) char(D.attribute(k).name), 0:numAttr-1, 'Uni',false);
%# attribute types
types = {'numeric' 'nominal' 'string' 'date' 'relational'};
attributeTypes = arrayfun(@(k) D.attribute(k-1).type, 1:numAttr);
attributeTypes = types(attributeTypes+1);
%# nominal attribute values
nominalValues = cell(numAttr,1);
for i=1:numAttr
if strcmpi(attributeTypes{i},'nominal')
nominalValues{i} = arrayfun(@(k) char(D.attribute(i-1).value(k-1)), 1:D.attribute(i-1).numValues, 'Uni',false);
end
end
%## instances
data = zeros(numInst,numAttr);
for i=1:numAttr
data(:,i) = D.attributeToDoubleArray(i-1);
end
%## visualize data
parallelcoords(data(:,1:end-1), ...
'Group',nominalValues{end}(data(:,end)+1), ...
'Labels',attributeNames(1:end-1))
title(relationName)
You can even directly use its functionality from MATLAB. An example:
%## classification
classifier = weka.classifiers.trees.J48();
classifier.buildClassifier( D );
fprintf('Classifier: %s %s\n%s', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())), ...
char(classifier.toString()) )
The output C4.5 decision tree:
Classifier: weka.classifiers.trees.J48 -C 0.25 -M 2
J48 pruned tree
------------------
petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)
Number of Leaves : 5
Size of the tree : 9
Yes, there are a few MATLAB interfaces for WEKA files on MATLAB File Exchange, I normally use this one: http://www.mathworks.com/matlabcentral/fileexchange/21204-matlab-weka-interface where you have a saveARFF() and a loadARFF() functions.
If you only want to load a file stored in "arff" format into Matlab, and don't need any other functionality from Weka, just remove the header part of your "arff" file (those attribute definitions), and save the file as csv format (you should replace class values with a numeric equivalences), and then use the built-in "csvread" function of Matlab. This way there is no need to find a third party package.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With