Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to count number of occurrences of unique items?

Tags:

julia

I have a very long DataArray of strings, and I would like to to generate a DataFrame in which one column is all the unique strings and the second is the number of occurrences. Right now I'm doing something like

using DataFrames
df = DataFrame()
df[:B]=[ "a", "c", "c", "D", "E"]
uniqueB = unique(df[:B])
println(uniqueB)
howMany=zeros(size(uniqueB))
for i=1:size(uniqueB,1)
    howMany[i] = count(j->(j==uniqueB[i]), df[:B])
end
answer = DataFrame()
answer[:Letters] = uniqueB
answer[:howMany] = howMany
answer

but it seems like there should be a much easier way to do this, possibly with a single line. (I know I could also make this a bit faster with somewhat more code by searching the result in each iteration rather than the source.) A possibly related question is here but it doesn't look like hist is overloaded for non-numerical bins. Any thoughts?

like image 214
ARM Avatar asked Apr 02 '15 00:04

ARM


1 Answers

If you want a full frame, you can group by B and call nrow on each group:

julia> by(df, :B, nrow)
4x2 DataFrames.DataFrame
| Row | B   | x1 |
|-----|-----|----|
| 1   | "D" | 1  |
| 2   | "E" | 1  |
| 3   | "a" | 1  |
| 4   | "c" | 2  |

Even outside the DataFrame context, though, you can always use DataStructures.counter rather than reimplementing it yourself:

julia> using DataStructures

julia> counter(df[:B])
DataStructures.Accumulator{ASCIIString,Int32}(Dict("D"=>1,"a"=>1,"c"=>2,"E"=>1))
like image 53
DSM Avatar answered Oct 14 '22 08:10

DSM