Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The number of occurences of elements in a vector [JULIA]

Tags:

vector

julia

I have a vector of 2500 values composed of repeated values and NaN values. I want to remove all the NaN values and compute the number of occurrences of each other value.

y
2500-element Array{Int64,1}:
8
43
NaN
46
NaN
8
8
3
46
NaN

For example: the number of occurences of 8 is 3 the number of occurences of 46 is 2 the number of occurences of 43 is 1.

like image 993
vincet Avatar asked Aug 23 '16 12:08

vincet


3 Answers

To remove the NaN values you can use the filter function. From the Julia docs:

filter(function, collection)

Return a copy of collection, removing elements for which function is false.

x = filter(y->!isnan(y),y)
filter!(y->!isnan(y),y)

Thus, we create as our function the conditional !isnan(y) and use it to filter the array y (note, we could also have written filter(z->!isnan(z),y) using z or any other variable we chose, since the first argument of filter is just defining an inline function). Note, we can either then save this as a new object or use the modify in place version, signaled by the ! in order to simply modify the existing object y

Then, either before or after this, depending on whether we want to include the NaNs in our count, we can use the countmap() function from StatsBase. From the Julia docs:

countmap(x)

Return a dictionary mapping each unique value in x to its number of occurrences.

using StatsBase
a = countmap(y)

you can then access specific elements of this dictionary, e.g. a[-1] will tell you how many occurrences there are of -1

Or, if you wanted to then convert that dictionary to an Array, you could use:

b = hcat([[key, val] for (key, val) in a]...)'

Note: Thanks to @JeffBezanon for comments on correct method for filtering NaN values.

like image 167
Michael Ohlrogge Avatar answered Nov 12 '22 21:11

Michael Ohlrogge


y=rand(1:10,20)
u=unique(y)
d=Dict([(i,count(x->x==i,y)) for i in u])
println("count for 10 is $(d[10])")
like image 36
Felipe Lema Avatar answered Nov 12 '22 21:11

Felipe Lema


countmap is the best solution I've seen so far, but here's a written out version, which is only slightly slower. It only passes over the array once, so if you have many unique values, it is very efficient:

function countmemb1(y)
    d = Dict{Int, Int}()
    for val in y
        if isnan(val)
            continue
        end
        if val in keys(d)
            d[val] += 1
        else
            d[val] = 1
        end
    end
    return d
end

The solution in the accepted answer can be a bit faster if there are a very small number of unique values, but otherwise scales poorly.

Edit: Because I just couldn't leave well enough alone, here's a version that is more generic and also faster (countmap doesn't accept strings, sets or tuples, for example):

function countmemb(itr)
    d = Dict{eltype(itr), Int}()
    for val in itr
        if isa(val, Number) && isnan(val)
            continue
        end
        d[val] = get(d, val, 0) + 1
    end
    return d
end
like image 34
DNF Avatar answered Nov 12 '22 22:11

DNF