I have a vector of 2500 values composed of repeated values and NaN
values. I want to remove all the NaN
values and compute the number of occurrences of each other value.
y
2500-element Array{Int64,1}:
8
43
NaN
46
NaN
8
8
3
46
NaN
For example: the number of occurences of 8 is 3 the number of occurences of 46 is 2 the number of occurences of 43 is 1.
To remove the NaN
values you can use the filter function. From the Julia docs:
filter(function, collection)
Return a copy of collection, removing elements for which function is false.
x = filter(y->!isnan(y),y)
filter!(y->!isnan(y),y)
Thus, we create as our function the conditional !isnan(y)
and use it to filter the array y
(note, we could also have written filter(z->!isnan(z),y)
using z
or any other variable we chose, since the first argument of filter
is just defining an inline function). Note, we can either then save this as a new object or use the modify in place version, signaled by the !
in order to simply modify the existing object y
Then, either before or after this, depending on whether we want to include the NaN
s in our count, we can use the countmap()
function from StatsBase. From the Julia docs:
countmap(x)
Return a dictionary mapping each unique value in x to its number of occurrences.
using StatsBase
a = countmap(y)
you can then access specific elements of this dictionary, e.g. a[-1]
will tell you how many occurrences there are of -1
Or, if you wanted to then convert that dictionary to an Array, you could use:
b = hcat([[key, val] for (key, val) in a]...)'
Note: Thanks to @JeffBezanon for comments on correct method for filtering NaN
values.
y=rand(1:10,20)
u=unique(y)
d=Dict([(i,count(x->x==i,y)) for i in u])
println("count for 10 is $(d[10])")
countmap
is the best solution I've seen so far, but here's a written out version, which is only slightly slower. It only passes over the array once, so if you have many unique values, it is very efficient:
function countmemb1(y)
d = Dict{Int, Int}()
for val in y
if isnan(val)
continue
end
if val in keys(d)
d[val] += 1
else
d[val] = 1
end
end
return d
end
The solution in the accepted answer can be a bit faster if there are a very small number of unique values, but otherwise scales poorly.
Edit: Because I just couldn't leave well enough alone, here's a version that is more generic and also faster (countmap
doesn't accept strings, sets or tuples, for example):
function countmemb(itr)
d = Dict{eltype(itr), Int}()
for val in itr
if isa(val, Number) && isnan(val)
continue
end
d[val] = get(d, val, 0) + 1
end
return d
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With