Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient partial permutation sort in Julia

I am dealing with a problem that requires a partial permutation sort by magnitude in Julia. If x is a vector of dimension p, then what I need are the first k indices corresponding to the k components of x that would appear first in a partial sort by absolute value of x.

Refer to Julia's sorting functions here. Basically, I want a cross between sortperm and select!. When Julia 0.4 is released, I will be able to obtain the same answer by applying sortperm! (this function) to the vector of indices and choosing the first k of them. However, using sortperm! is not ideal here because it will sort the remaining p-k indices of x, which I do not need.

What would be the most memory-efficient way to do the partial permutation sort? I hacked a solution by looking at the sortperm source code. However, since I am not versed in the ordering modules that Julia uses there, I am not sure if my approach is intelligent.

One important detail: I can ignore repeats or ambiguities here. In other words, I do not care about the ordering by abs() of indices for two components 2 and -2. My actual code uses floating point values, so exact equality never occurs for practical purposes.

# initialize a vector for testing
x  = [-3,-2,4,1,0,-1]
x2 = copy(x)
k  = 3    # num components desired in partial sort
p  = 6    # num components in x, x2

# what are the indices that sort x by magnitude?
indices = sortperm(x, by = abs, rev = true)

# now perform partial sort on x2
select!(x2, k, by = abs, rev = true)

# check if first k components are sorted here
# should evaluate to "true"
isequal(x2[1:k], x[indices[1:k]])

# now try my partial permutation sort
# I only need indices2[1:k] at end of day!
indices2 = [1:p]
select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))

# same result? should evaluate to "true"
isequal(indices2[1:k], indices[1:k])

EDIT: With the suggested code, we can briefly compare performance on much larger vectors:

p = 10000; k = 100;    # asking for largest 1% of components
x  = randn(p); x2 = copy(x);

# run following code twice for proper timing results
@time {indices = sortperm(x, by = abs, rev = true); indices[1:k]};
@time {indices2 = [1:p]; select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))};
@time selectperm(x,k);

My output:

elapsed time: 0.048876901 seconds (19792096 bytes allocated)
elapsed time: 0.007016534 seconds (2203688 bytes allocated)
elapsed time: 0.004471847 seconds (1657808 bytes allocated)
like image 287
Kevin L. Keys Avatar asked Dec 10 '25 05:12

Kevin L. Keys


1 Answers

The following version appears to be relatively space-efficient because it uses only an integer array of the same length as the input array:

function selectperm (x,k)
    if k > 1 then
        kk = 1:k
    else
        kk = 1
    end
    z = collect(1:length(x))
    return select!(z,1:k,by = (i)->abs(x[i]), rev = true)
end    
x  = [-3,-2,4,1,0,-1]

k  = 3    # num components desired in partial sort
print (selectperm(x,k))

The output is:

[3,1,2]

... as expected.

I'm not sure if it uses less memory than the originally-proposed solution (though I suspect the memory usage is similar) but the code may be clearer and it does produce only the first k indices whereas the original solution produced all p indices.

(Edit)

selectperm() has been edited to deal with the BoundsError that occurs if k=1 in the call to select!().

like image 156
Simon Avatar answered Dec 12 '25 08:12

Simon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!