Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to load .arff format file to matlab

Tags:

matlab

weka

Is there any package to load .arff format file into matlab? The .arff format is used in Weka for running machine learning algorithm.

like image 377
Learner Avatar asked Aug 05 '11 06:08

Learner


3 Answers

Since Weka is a Java library, you can directly use the API it exposes to read ARFF files:

%## paths
WEKA_HOME = 'C:\Program Files\Weka-3-7';
javaaddpath([WEKA_HOME '\weka.jar']);
fName = [WEKA_HOME '\data\iris.arff'];

%## read file
loader = weka.core.converters.ArffLoader();
loader.setFile( java.io.File(fName) );
D = loader.getDataSet();
D.setClassIndex( D.numAttributes()-1 );

%## dataset
relationName = char(D.relationName);
numAttr = D.numAttributes;
numInst = D.numInstances;

%## attributes
%# attribute names
attributeNames = arrayfun(@(k) char(D.attribute(k).name), 0:numAttr-1, 'Uni',false);

%# attribute types
types = {'numeric' 'nominal' 'string' 'date' 'relational'};
attributeTypes = arrayfun(@(k) D.attribute(k-1).type, 1:numAttr);
attributeTypes = types(attributeTypes+1);

%# nominal attribute values
nominalValues = cell(numAttr,1);
for i=1:numAttr
    if strcmpi(attributeTypes{i},'nominal')
        nominalValues{i} = arrayfun(@(k) char(D.attribute(i-1).value(k-1)), 1:D.attribute(i-1).numValues, 'Uni',false);
    end
end

%## instances
data = zeros(numInst,numAttr);
for i=1:numAttr
    data(:,i) = D.attributeToDoubleArray(i-1);
end

%## visualize data
parallelcoords(data(:,1:end-1), ...
    'Group',nominalValues{end}(data(:,end)+1), ...
    'Labels',attributeNames(1:end-1))
title(relationName)

parallel_coordinates

You can even directly use its functionality from MATLAB. An example:

%## classification
classifier = weka.classifiers.trees.J48();
classifier.buildClassifier( D );
fprintf('Classifier: %s %s\n%s', ...
    char(classifier.getClass().getName()), ...
    char(weka.core.Utils.joinOptions(classifier.getOptions())), ...
    char(classifier.toString()) )

The output C4.5 decision tree:

Classifier: weka.classifiers.trees.J48 -C 0.25 -M 2
J48 pruned tree
------------------

petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
|   petalwidth <= 1.7
|   |   petallength <= 4.9: Iris-versicolor (48.0/1.0)
|   |   petallength > 4.9
|   |   |   petalwidth <= 1.5: Iris-virginica (3.0)
|   |   |   petalwidth > 1.5: Iris-versicolor (3.0/1.0)
|   petalwidth > 1.7: Iris-virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :  9
like image 192
Amro Avatar answered Sep 19 '22 22:09

Amro


Yes, there are a few MATLAB interfaces for WEKA files on MATLAB File Exchange, I normally use this one: http://www.mathworks.com/matlabcentral/fileexchange/21204-matlab-weka-interface where you have a saveARFF() and a loadARFF() functions.

like image 43
Matteo De Felice Avatar answered Sep 17 '22 22:09

Matteo De Felice


If you only want to load a file stored in "arff" format into Matlab, and don't need any other functionality from Weka, just remove the header part of your "arff" file (those attribute definitions), and save the file as csv format (you should replace class values with a numeric equivalences), and then use the built-in "csvread" function of Matlab. This way there is no need to find a third party package.

like image 44
user58419 Avatar answered Sep 19 '22 22:09

user58419