I'm new to Julia and I've written a simple function that calculates RMSE (root mean square error). ratings
is a matrix of ratings, each row is [user, film, rating]
. There are 15 million ratings. The rmse()
method takes 12.0 s, but Java implementation is about 188x faster: 0.064 s. Why is the Julia implementation that slow? In Java, I'm working with an array of Rating
objects, if it was a multidimensional int
array, it would be even faster.
ratings = readdlm("ratings.dat", Int32)
function predict(user, film)
return 3.462
end
function rmse()
total = 0.0
for i in 1:size(ratings, 1)
r = ratings[i,:]
diff = predict(r[1], r[2]) - r[3]
total += diff * diff
end
return sqrt(total / size(ratings)[1])
end
EDIT: After avoiding the global variable, it finishes in 1.99 s (31x slower than Java). After removing the r = ratings[i,:]
, it's 0.856 s (13x slower).
A few suggestions:
ratings
in as an argument.r = ratings[i,:]
line makes a copy, which is slow. Instead, use predict(r[i,1], r[i,2]) - r[i,3]
.square()
may be faster than x*x
-- try it.NumericExtensions.jl
package, which has insanely optimized functions for many common numerical operations. (see the julia-dev list)For me the following code runs in 0.024 seconds (and I doubt my laptop is a lot faster than your machine). I initialized ratings with the commented-out line, since I didn't have the file you referred to.
function predict(user, film)
return 3.462
end
function rmse(r)
total = 0.0
for i = 1:size(r,1)
diff = predict(r[i,1],r[i,2]) - r[i,3]
total += diff * diff
end
return sqrt(total / size(r,1))
end
# ratings = rand(1:20, 5000000, 3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With