Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia much slower than Java

I'm new to Julia and I've written a simple function that calculates RMSE (root mean square error). ratings is a matrix of ratings, each row is [user, film, rating]. There are 15 million ratings. The rmse() method takes 12.0 s, but Java implementation is about 188x faster: 0.064 s. Why is the Julia implementation that slow? In Java, I'm working with an array of Rating objects, if it was a multidimensional int array, it would be even faster.

ratings = readdlm("ratings.dat", Int32)

function predict(user, film)
    return 3.462
end

function rmse()
    total = 0.0
    for i in 1:size(ratings, 1)
        r = ratings[i,:]
        diff = predict(r[1], r[2]) - r[3]
        total += diff * diff
    end
    return sqrt(total / size(ratings)[1])
end

EDIT: After avoiding the global variable, it finishes in 1.99 s (31x slower than Java). After removing the r = ratings[i,:], it's 0.856 s (13x slower).

like image 443
fhucho Avatar asked Jun 22 '13 14:06

fhucho


2 Answers

A few suggestions:

  • Don't use globals. For annoying technical reasons, they're slow. Instead, pass ratings in as an argument.
  • The r = ratings[i,:] line makes a copy, which is slow. Instead, use predict(r[i,1], r[i,2]) - r[i,3].
  • square() may be faster than x*x -- try it.
  • If you're using the bleeding-edge Julia from source, check out the brand new NumericExtensions.jl package, which has insanely optimized functions for many common numerical operations. (see the julia-dev list)
  • Julia has to compile the code the first time it executes it. The right way to benchmark in Julia is to do the timing several times and ignore the first time through.
like image 134
Harlan Avatar answered Oct 19 '22 23:10

Harlan


For me the following code runs in 0.024 seconds (and I doubt my laptop is a lot faster than your machine). I initialized ratings with the commented-out line, since I didn't have the file you referred to.

function predict(user, film)
    return 3.462
end

function rmse(r)
    total = 0.0
    for i = 1:size(r,1)
        diff = predict(r[i,1],r[i,2]) - r[i,3]
        total += diff * diff
    end
    return sqrt(total / size(r,1))
end

# ratings = rand(1:20, 5000000, 3)
like image 35
tholy Avatar answered Oct 19 '22 22:10

tholy