Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using scikit-learn in Julia through PyCall

I'm trying to use Scikit-learn in Julia through PyCall.

As a start, I'm trying to read the iris data into a Julia data structure.

This is the code in Python:

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

iris = datasets.load_iris()

X = iris.data
y = iris.target

The PyCall documentation says Python methods are called in Julia like, for example:

my_dna[:find]("ACT")

as opposed to:

my_dna.find("ACT")

in Python.

My attempt to do import the iris data in Julia is:

using PyCall
@pyimport sklearn.datasets as datasets
@pyimport sklearn.naive_bayes as NB

iris = datasets.load_iris()

X = ...?
Y = ...?

The iris = datasets.load_iris() call works where iris is then a Dict{Any,Any} type.

I'm not sure if this correct. I tried iris = datasets[:load_iris] instead but this results in:

ERROR: LoadError: MethodError: no method matching getindex(::Module, ::Symbol)

Going further, how would I read iris.data and iris.target into X and Y?

like image 809
Mike Avatar asked May 22 '26 04:05

Mike


2 Answers

As you say, Julia tells you what type iris is:

julia v0.5> @pyimport sklearn.datasets as datasets

julia v0.5> @pyimport sklearn.naive_bayes as NB

julia v0.5> iris = datasets.load_iris()
Dict{Any,Any} with 5 entries:
  "feature_names" => Any["sepal length (cm)","sepal width (cm)","petal length (…
  "target_names"  => PyObject array(['setosa', 'versicolor', 'virginica'], …
  "data"          => [5.1 3.5 1.4 0.2; 4.9 3.0 1.4 0.2; … ; 6.2 3.4 5.4 2.3; 5.…
  "target"        => [0,0,0,0,0,0,0,0,0,0  …  2,2,2,2,2,2,2,2,2,2]
  "DESCR"         => "Iris Plants Database\n====================\n\nNotes\n----…

It also tells you what the keys in the dictionary are. So now you just use Julia's syntax for accessing values in a dictionary (result snipped):

julia v0.5> X = iris["data"]
150×4 Array{Float64,2}:
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 4.7  3.2  1.3  0.2

julia v0.5> Y = iris["target"]
150-element Array{Int64,1}:
 0
 0

Note that I did not know the answer to this question. I just let Julia guide me as to what to do.

Finally, as @ChrisRackauckas suggested, there is already a Julia package that wraps scikit-learn: https://github.com/cstjean/ScikitLearn.jl

like image 198
David P. Sanders Avatar answered May 24 '26 17:05

David P. Sanders


Since there were some changes, I'd like to add the current syntax of of PyCall (currently version 1.91.4) in addition to Davids answer.

The python code

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

iris = datasets.load_iris()

X = iris.data
y = iris.target

becomes in Julia:

using PyCall
datasets = pyimport("sklearn.datasets")
GaussianNB = pyimport("sklearn.naive_bayes")
iris = datasets.load_iris()
X = iris["data"]
y = iris["target"]
like image 37
wueli Avatar answered May 24 '26 18:05

wueli