Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting Pandas DataFrame to Orange Table

I notice that this is an issue on GitHub already. Does anyone have any code that converts a Pandas DataFrame to an Orange Table?

Explicitly, I have the following table.

       user  hotel  star_rating  user  home_continent  gender
0         1     39          4.0     1               2  female
1         1     44          3.0     1               2  female
2         2     63          4.5     2               3  female
3         2      2          2.0     2               3  female
4         3     26          4.0     3               1    male
5         3     37          5.0     3               1    male
6         3     63          4.5     3               1    male
like image 629
hlin117 Avatar asked Oct 12 '14 00:10

hlin117


1 Answers

In order to convert pandas DataFrame to Orange Table you need to construct a domain, which specifies the column types.

For continuous variables, you only need to provide the name of the variable, but for Discrete variables, you also need to provide a list of all possible values.

The following code will construct a domain for your DataFrame and convert it to an Orange Table:

import numpy as np
from Orange.feature import Discrete, Continuous
from Orange.data import Domain, Table
domain = Domain([
    Discrete('user', values=[str(v) for v in np.unique(df.user)]),
    Discrete('hotel', values=[str(v) for v in np.unique(df.hotel)]),
    Continuous('star_rating'),
    Discrete('user', values=[str(v) for v in np.unique(df.user)]),
    Discrete('home_continent', values=[str(v) for v in np.unique(df.home_continent)]),
    Discrete('gender', values=['male', 'female'])], False)
table = Table(domain, [map(str, row) for row in df.as_matrix()])

The map(str, row) step is needed so Orange know that the data contains values of discrete features (and not the indices of values in the values list).

like image 188
astaric Avatar answered Sep 20 '22 07:09

astaric