I have a data.frame
that looks similar to this example:
> head(dd)
# paths counts
#1 s 4735
#2 dt 4635
#3 so 2191
#4 sb 1949
#5 dt-dt 1310
#6 s-s 978
where different steps in a path are separated by -
. As you can see, some paths are length 1, some are > 1 step (up to 5 steps in the example).
Now I would like to visualize the data as a sunburst
plot using the sunburstR
package. I do it like this:
# devtools::install_github("timelyportfolio/sunburstR")
library(sunburstR)
sunburst(dd)
Unfortunately, this doesn't produce any output and I don't understand why. As another example, this works as expected:
sunburst(tail(dd, 8))
but this doesn't:
sunburst(tail(dd, 9))
I also noticed that
sunburst(dd[c(5, 1:4),])
produces a plot, but surprisingly, the dt
category is spit into two chunks, where it should normally be displayed as a single chunk on the first (innermost) level.
Q: Can someone explain to me why this would happen (some approaches work, some don't and some work but display the data somewhat incorrectly) and what I need to do to visualize the whole data set (more than just the sample data)?
Sample data
dd <- structure(list(paths = c("s", "dt", "so", "sb", "dt-dt", "s-s",
"so-dt", "dt-dt-dt", "sb-sb", "so-so", "s-s-s", "s-rd", "dt-dt-dt-dt",
"s-sb", "a", "so-dt-dt", "s-rd-rd", "r", "dt-s", "so-sb", "dt-sb",
"s-rd-rd-rd", "dt-rd", "dt-dt-dt-dt-dt", "so-dt-dt-dt"), counts = c(4735L,
4635L, 2191L, 1949L, 1310L, 978L, 558L, 455L, 324L, 281L, 266L,
231L, 208L, 200L, 200L, 196L, 156L, 150L, 142L, 129L, 123L, 114L,
113L, 113L, 100L)), .Names = c("paths", "counts"), class = "data.frame", row.names = c(NA, -25L))
dd
contains sequences that are subsequences of of others:
tail(dd, 9)
# paths counts
# 17 s-rd-rd 156 # <-----
# 18 r 150
# 19 dt-s 142
# 20 so-sb 129
# 21 dt-sb 123
# 22 s-rd-rd-rd 114 # <-----
# 23 dt-rd 113
# 24 dt-dt-dt-dt-dt 113
# 25 so-dt-dt-dt 100
E.g. s-rd-rd
is part of s-rd-rd-rd
. sunburst
seems to choke on that.
In the package author's example you'll notice an additional
-end
to prevent such cases. This is also mentioned in the tips here:
each line should be a complete path from root to leaf - don't include counts for intermediate steps. For example, include "home-search-end" and "home-search-product-end" but not "home-search" - the latter is computed by the partition layout, by adding up the counts of all the sequences with that prefix.
This seems to do the trick here, too:
transform(tail(dd, 9), paths=paste0(paths, "-end"))
# paths counts
# 17 s-rd-rd-end 156
# 18 r-end 150
# 19 dt-s-end 142
# 20 so-sb-end 129
# 21 dt-sb-end 123
# 22 s-rd-rd-rd-end 114
# 23 dt-rd-end 113
# 24 dt-dt-dt-dt-dt-end 113
# 25 so-dt-dt-dt-end 100
sunburst(transform(tail(dd, 9), paths=paste0(paths, "-end")))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With