Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory-efficient sortperm on a column of a matrix

Tags:

julia

I have a large-ish matrix and I'd like to apply sortperm to each column of that matrix. The naive thing to do is

order = sortperm(X[:,j])

which makes a copy. That seems like a shame, so I thought I'd try a SubArray:

order = sortperm(sub(X,1:n,j))

but that was even slower. For a laugh I tried

order = sortperm(1:n,by=i->X[i,j])

but of course that was terrible. What is the fastest way to do this?

Here is some benchmark code:

getperm1(X,n,j) = sortperm(X[:,j])
getperm2(X,n,j) = sortperm(sub(X,1:n,j))
getperm3(X,n) = mapslices(sortperm, X, 1)
n = 1000000
X = rand(n, 10)
for f in [getperm1, getperm2]
    println(f)
    for it in 1:5
        gc()
        @time f(X,n,5)
    end
end
for f in [getperm3]
    println(f)
    for it in 1:5
        gc()
        @time getperm3(X,n)
    end
end

results:

getperm1
elapsed time: 0.258576164 seconds (23247944 bytes allocated)
elapsed time: 0.141448346 seconds (16000208 bytes allocated)
elapsed time: 0.137306078 seconds (16000208 bytes allocated)
elapsed time: 0.137385171 seconds (16000208 bytes allocated)
elapsed time: 0.139137529 seconds (16000208 bytes allocated)
getperm2
elapsed time: 0.433251141 seconds (11832620 bytes allocated)
elapsed time: 0.33970986 seconds (8000624 bytes allocated)
elapsed time: 0.339840795 seconds (8000624 bytes allocated)
elapsed time: 0.342436716 seconds (8000624 bytes allocated)
elapsed time: 0.342867431 seconds (8000624 bytes allocated)
getperm3
elapsed time: 1.766020534 seconds (257397404 bytes allocated, 1.55% gc time)
elapsed time: 1.43763525 seconds (240007488 bytes allocated, 1.85% gc time)
elapsed time: 1.41373546 seconds (240007488 bytes allocated, 1.82% gc time)
elapsed time: 1.42215519 seconds (240007488 bytes allocated, 1.83% gc time)
elapsed time: 1.419174037 seconds (240007488 bytes allocated, 1.83% gc time)

Where the mapslices version is 10x the getperm1 version, as you'd expect.

Its worth pointing out that, on my machine at least, the copy+sortperm option is not that much slower than just a sortperm on a vector of the same length, but no memory allocation is necessary so it'd be nice to avoid it.

like image 357
IainDunning Avatar asked Nov 09 '22 17:11

IainDunning


1 Answers

You can beat the performance of SubArray in a few very specific cases (like taking a continuous view of an Array) with pointer magic:

function colview(X::Matrix,j::Int)
    n = size(X,1)
    offset = 1+n*(j-1) # The linear start position
    checkbounds(X, offset+n-1)
    pointer_to_array(pointer(X, offset), (n,))
end

getperm4(X,n,j) = sortperm(colview(X,j))

The function colview will return a full-fledged Array that shares its data with the original X. Note that this is a terrible idea because the returned array is referencing data that Julia is only keeping track of through X. This means that if X goes out of scope before the column "view" data access will crash with a segfault.

With results:

getperm1
elapsed time: 0.317923176 seconds (15 MB allocated)
elapsed time: 0.252215996 seconds (15 MB allocated)
elapsed time: 0.215124686 seconds (15 MB allocated)
elapsed time: 0.210062109 seconds (15 MB allocated)
elapsed time: 0.213339974 seconds (15 MB allocated)
getperm2
elapsed time: 0.509172302 seconds (7 MB allocated)
elapsed time: 0.509961218 seconds (7 MB allocated)
elapsed time: 0.506399583 seconds (7 MB allocated)
elapsed time: 0.512562736 seconds (7 MB allocated)
elapsed time: 0.506199265 seconds (7 MB allocated)
getperm4
elapsed time: 0.225968056 seconds (7 MB allocated)
elapsed time: 0.220587707 seconds (7 MB allocated)
elapsed time: 0.219854355 seconds (7 MB allocated)
elapsed time: 0.226289377 seconds (7 MB allocated)
elapsed time: 0.220391515 seconds (7 MB allocated)

I've not looked into why the performance is worse with SubArray, but it may simply be from an extra pointer dereference on every memory access. It's very remarkable how little the allocation actually costs you in terms of time - getperm1's timings are more variable, but it still occasionally beats getperm4! I think that this is due to some extra pointer math in Array's internal implementation with shared data. There's also some crazy caching behavior… getperm1 gets significantly faster on repeated runs.

like image 151
mbauman Avatar answered Dec 16 '22 12:12

mbauman