I am trying to export a pandas dataframe to .arff file to use it in Weka. I have seen that the module liac-arff can be used for that purpose. Going on the documentation here it seems I have to use
arff.dump(obj,fp)
Though, I am struggling with obj ( a dictionary) I'm guessing I have to create this by myself. How do you suggest me to do that properly? in a big dataset (3 000 000 lines and 95 columns) is there any example you can provide me to export from pandas dataframe to .arff file using python (v 2.7)?
First install the package:
$ pip install arff
Then use in Python:
import arff
arff.dump('filename.arff'
, df.values
, relation='relation name'
, names=df.columns)
Where df
is of type pandas.DataFrame
. Voila.
This is how I did it recently using the package liac-arff. Event if the arff package is more easy to use, it doesn't allow the definition of column types and values of categorical attributes.
df = pd.DataFrame(...)
attributes = [(c, 'NUMERIC') for c in df.columns.values[:-1]]
attributes += [('target', df[t].unique().astype(str).tolist())]
t = df.columns[-1]
data = [df.loc[i].values[:-1].tolist() + [df[t].loc[i]] for i in range(df.shape[0])]
arff_dic = {
'attributes': attributes,
'data': data,
'relation': 'myRel',
'description': ''
}
with open("myfile.arff", "w", encoding="utf8") as f:
arff.dump(arff_dic, f)
Values of categorical attributes such as target must be of type str, event if they are numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With