Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

exporting dataframe to arff file python

Tags:

python

arff

I am trying to export a pandas dataframe to .arff file to use it in Weka. I have seen that the module liac-arff can be used for that purpose. Going on the documentation here it seems I have to use arff.dump(obj,fp) Though, I am struggling with obj ( a dictionary) I'm guessing I have to create this by myself. How do you suggest me to do that properly? in a big dataset (3 000 000 lines and 95 columns) is there any example you can provide me to export from pandas dataframe to .arff file using python (v 2.7)?

like image 492
mina Avatar asked Dec 18 '22 00:12

mina


2 Answers

First install the package: $ pip install arff

Then use in Python:

import arff
arff.dump('filename.arff'
      , df.values
      , relation='relation name'
      , names=df.columns)

Where df is of type pandas.DataFrame. Voila.

like image 111
Pero Avatar answered Dec 27 '22 02:12

Pero


This is how I did it recently using the package liac-arff. Event if the arff package is more easy to use, it doesn't allow the definition of column types and values of categorical attributes.

df = pd.DataFrame(...)
attributes = [(c, 'NUMERIC') for c in df.columns.values[:-1]]
attributes += [('target', df[t].unique().astype(str).tolist())]
t = df.columns[-1]
data = [df.loc[i].values[:-1].tolist() + [df[t].loc[i]] for i in range(df.shape[0])]

arff_dic = {
    'attributes': attributes,
    'data': data,
    'relation': 'myRel',
    'description': ''
}

with open("myfile.arff", "w", encoding="utf8") as f:
     arff.dump(arff_dic, f)

Values of categorical attributes such as target must be of type str, event if they are numbers.

like image 42
M . Franklin Avatar answered Dec 27 '22 02:12

M . Franklin