Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prop.table() in julia

I am trying to move from R to Julia.

So I have a dataset with 2 columns of prices and 2 conditional columns telling me if the price is "cheap" or "expensive".

So I want to count how many "cheap" or "expensive" entries are.

So using the package DataStructures I got this:

using DataStructures
counter(df.p_orellana)

Accumulator{Union{Missing, String}, Int64} with 3 entries:
  "expensive"   => 18
  missing  => 2
  "cheap" => 22

This would be the same as the table() function in R.

Is there any way to make these values proportions?

In R it would be to prop.Table() function, but I am not sure how to do it with Julia.

I would like to have:

Accumulator{Union{Missing, String}, Int64} with 3 entries:
  "expensive"   => 0.4285
  missing  => 0.0476
  "cheap" => 0.5238

Thanks in advance!

like image 536
Jorge Paredes Avatar asked Oct 14 '22 19:10

Jorge Paredes


2 Answers

Use the FreqTables.jl package.

Here is an example:

julia> using FreqTables

julia> data = [fill("expensive", 18); fill(missing, 2); fill("cheap", 22)];

julia> freqtable(data)
3-element Named Vector{Int64}
Dim1      │
──────────┼───
cheap     │ 22
expensive │ 18
missing   │  2

julia> proptable(data)
3-element Named Vector{Float64}
Dim1      │
──────────┼─────────
cheap     │  0.52381
expensive │ 0.428571
missing   │ 0.047619

The results are shown in sorted order. If you would like other order use the CategoricalArrays.jl package additionally and set an appropriate ordering of levels:

julia> using CategoricalArrays

julia> cat_data = categorical(data, levels=["expensive", "cheap"]);

julia> freqtable(cat_data)
3-element Named Vector{Int64}
Dim1        │
────────────┼───
"expensive" │ 18
"cheap"     │ 22
missing     │  2

julia> proptable(cat_data)
3-element Named Vector{Float64}
Dim1        │
────────────┼─────────
"expensive" │ 0.428571
"cheap"     │  0.52381
missing     │ 0.047619
like image 181
Bogumił Kamiński Avatar answered Oct 18 '22 13:10

Bogumił Kamiński


Adding a base Julia approach.
The function tableprop can be put into ~/.julia/config/startup.jl to load automatically.

function tableprop(data::Vector)
  uniq = unique(data)
  res = [sum(data .=== i) for i in uniq]
  try
    DataFrame(data=uniq, count=res, prop=res/sum(res))
  catch
    hcat(uniq, res, res/sum(res))
  end
end

julia> using DataFrames # just for pretty print

julia> tableprop(df)
3×3 DataFrame
 Row │ data       count  prop    
     │ String?    Int64  Float64 
─────┼───────────────────────────
   1 │ cheap          5      0.5
   2 │ expensive      3      0.3
   3 │ missing        2      0.2

Data

julia> df = ["cheap","expensive",missing,"cheap","cheap",
                     "expensive","expensive","cheap",missing,"cheap"]
10-element Vector{Union{Missing, String}}:
 "cheap"
 "expensive"
 missing
 "cheap"
 "cheap"
 "expensive"
 "expensive"
 "cheap"
 missing
 "cheap"
like image 30
Andre Wildberg Avatar answered Oct 18 '22 14:10

Andre Wildberg