Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia: Visualization of a categorical data on a grid

Sometimes it is needed to draw categorical values on a regular grid to show how they cover a certain area. In principle, the plot() function is a good fit for this, but there is a problem that is needed to adjust the size of the icons each time to create the illusion of a solid cover. When changing the coverage of the image, the old size becomes irrelevant and is needed to adjust it again. Is there a technique to adjust this size automatically?

using Plots
using CategoricalArrays
a = [1, 2, 3, 1, 2, 3, 1, 2, 3]
b = [1, 1, 1, 2, 2, 2, 3, 3, 3]
c = CategoricalArray(["X", "X", "Y", "Z", "Y", "Y", "Z", "Y", "Z"])
plot(a, b, group = c, seriestype = :scatter, aspect_ratio = 1, markersize=90, 
markershape=:square, markerstrokewidth=0.0, xlim = (0.5, 3.5), ylim = (0.5, 3.5))

The result is good in everything, except that each time you need to adjust the size of the cells so that there are no overlapping areas or gaps:

enter image description here

As an alternative, I considered heatmap(), but it works quite strangely with categorical data, setting them some kind of scale of its own with a continuous gradation of values. I haven't come across any examples where using heatmap() would get a map with a beautiful legend like plot(), so I'm not sure that using heatmap() is the right way here.

a = b = [1, 2, 3]
c = CategoricalArray(["X" "X" "Y"; "Z" "Y" "Y"; "Z" "Y" "Z"])
heatmap(a, b, c)

enter image description here

Maybe there is still some way to automatically set the size of the cells of plot()?

like image 262
Anton Degterev Avatar asked May 11 '21 18:05

Anton Degterev


People also ask

How do you visualize the distribution of categorical data?

The bar chart is a familiar way of visualizing categorical distributions. It displays a bar for each category. The bars are equally spaced and equally wide. The length of each bar is proportional to the frequency of the corresponding category.

What is the best way to visualize categorical data?

Bar Charts and Pie Charts are used to visualize categorical data. Both types of graphs contain variations as displayed in the visual.

Can you use a bar plot for categorical data?

A bar chart (aka bar graph, column chart) plots numeric values for levels of a categorical feature as bars.

Which is the best plot for categorical variables?

Mosaic plots are good for comaparing two categorical variables, particularly if you have a natural sorting or want to sort by size.


1 Answers

There are various ways to create such a plot within Plots.jl. Perhaps the most obvious interpretation of what you want is shapes. For that approach, you also need to understand how to group unconnected data within the same groups. A solution based on shapes could look like this:

a = [1, 2, 3, 1, 2, 3, 1, 2, 3]
b = [1, 1, 1, 2, 2, 2, 3, 3, 3]
c = CategoricalArray(["X", "X", "Y", "Z", "Y", "Y", "Z", "Y", "Z"])

groups = Dict(cat => NTuple{2,Int}[] for cat in levels(c))
for (ca, cb, cat) in zip(a,b,c)
    push!(groups[cat], (ca,cb))
end

w = 1
shapes = map(collect(groups)) do (cat, vals)
    cat => mapreduce(vcat, vals) do (ca, cb)
        [ca cb] .+ [-.5 -.5; .5 -.5; .5 .5; -.5 .5; -.5 -.5; NaN NaN]*w
    end
end

p = plot(aspect_ratio=1)
for (cat, s) in sort(shapes;by=x->x[1])
    plot!(s[:,1], s[:,2], label=cat, seriestype=:shape, linewidth=0)
end

enter image description here

Most of the code is simply moving the data around so we get a Vector of Pairs from the categorical value to a matrix specifying all of the vertices, like this for "X":

"X" =>
12×2 Matrix{Float64}:
   0.5    0.5
   1.5    0.5
   1.5    1.5
   0.5    1.5
   0.5    0.5
 NaN    NaN
   1.5    0.5
   2.5    0.5
   2.5    1.5
   1.5    1.5
   1.5    0.5
 NaN    NaN

A perhaps slightly simpler solution would be to "trick" Plots to display what we want using a heatmap, like this:

a = b = [1, 2, 3]
c = CategoricalArray(["X" "X" "Y"; "Z" "Y" "Y"; "Z" "Y" "Z"])
pal = palette(:default)
p = plot(aspect_ratio=1, size=(400,400))
heatmap!(a,b,c, c=pal, colorbar=false, clims=(1,length(pal)))
for cat in sort(collect(Set(c)))
    plot!(
        [], [], seriestype=:shape,
        label=cat, color=pal[levelcode(cat)]
    )
end

enter image description here

like image 125
ahnlabb Avatar answered Oct 19 '22 01:10

ahnlabb