Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia's most efficient way to choose longest array in array of arrays?

Tags:

arrays

max

julia

I have an array of arrays A which is an N-element Array{Array{Int64,1},1} of integers. I am trying to find the largest array in A using Julia.

For example:

A =  [[1, 2], [3, 4], [5, 6, 7], [1, 2, 5, 8]]

In Python I would simply do: max(A, key=len) but in Julia I don't know how to do it.

What I did is this:

L = []
for a in A
    push!(L, length(a))
end
A[findmax(L)[2]]

Thanks!

like image 967
Jarbou Avatar asked Jan 04 '18 22:01

Jarbou


3 Answers

@Colin has provided a compact, convenient answer. However, if speed matters (op asked for most efficient way) this should be close to optimum

function findlongest(A)
    idx = 0
    len = 0
    @inbounds for i in 1:length(A)
        l = length(A[i])
        l > len && (idx = i; len=l)
    end
    return A[idx]
end

Note that this implementation would (presumably) be a really bad idea in Python :)

Quick benchmark:

julia> using BenchmarkTools

julia> A = [[1,2], [1,2,3,4,5,6], [1,2,3]]
3-element Array{Array{Int64,1},1}:
 [1, 2]            
 [1, 2, 3, 4, 5, 6]
 [1, 2, 3] 

julia> @btime findlongest(A);
  26.880 ns (0 allocations: 0 bytes)

julia> @btime A[indmax(length.(A))];
  9.813 μs (25 allocations: 1.14 KiB)

That's a ~365 times speedup for this example.

EDIT: Better benchmark (suggested in the comments)

julia> @btime findlongest($A);
  9.813 ns (0 allocations: 0 bytes)

julia> @btime $A[indmax(length.($A))];
  41.813 ns (1 allocation: 112 bytes)

The $ signs avoid setup allocations and times. Speedup ~4.

Quick explanation

  • for loops are fast in Julia, so why not use them
  • avoid allocation (length.(A) allocates a new array of integers)
  • a && b is shortcut for "if a then b"
  • @inbounds avoids bound checks for A[i]
like image 133
carstenbauer Avatar answered Oct 12 '22 16:10

carstenbauer


UPDATE: For v1+ you'll need to replace indmax in this answer with argmax.

EDIT: Note, it is also worth checking out the other answer by @crstnbr

Consider the following example code:

julia> A = [[1,2], [1,2,3,4,5,6], [1,2,3]]
3-element Array{Array{Int64,1},1}:
 [1, 2]            
 [1, 2, 3, 4, 5, 6]
 [1, 2, 3]         

julia> length(A)
3

julia> length.(A)
3-element Array{Int64,1}:
 2
 6
 3

julia> indmax(length.(A))
2

julia> A[indmax(length.(A))]
6-element Array{Int64,1}:
 1
 2
 3
 4
 5
 6

The first call to length gets the length of the outer vector in A, which is not what we want. In the second call, I use the broadcasting operator . so that I instead get the length of each of the inner vectors. In the indmax line, I'm finding the index of largest value in length.(A), ie the index of the longest inner vector. If you instead want to return the longest inner vector, you can just index into A using the result of the indmax line.

like image 25
Colin T Bowers Avatar answered Oct 12 '22 16:10

Colin T Bowers


indmax is no longer defined in Julia (at least 1.3).

Use argmax instead.

>>> A = [[1,2], [1,2,3]]
2-element Array{Array{Int64,1},1}:
 [1, 2]   
 [1, 2, 3]

>>> length.(A)
2-element Array{Int64,1}:
 2
 3

>>> argmax(length.(A))
2

>>> A[argmax(length.(A))]
3-element Array{Int64,1}:
 1
 2
 3
like image 26
OrangeSherbet Avatar answered Oct 12 '22 16:10

OrangeSherbet