Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding sunburstR behaviour

I have a data.frame that looks similar to this example:

> head(dd)
#  paths counts
#1     s   4735
#2    dt   4635
#3    so   2191
#4    sb   1949
#5 dt-dt   1310
#6   s-s    978

where different steps in a path are separated by -. As you can see, some paths are length 1, some are > 1 step (up to 5 steps in the example).

Now I would like to visualize the data as a sunburst plot using the sunburstR package. I do it like this:

# devtools::install_github("timelyportfolio/sunburstR")
library(sunburstR)
sunburst(dd)

Unfortunately, this doesn't produce any output and I don't understand why. As another example, this works as expected:

sunburst(tail(dd, 8))

but this doesn't:

sunburst(tail(dd, 9))

I also noticed that

sunburst(dd[c(5, 1:4),])

produces a plot, but surprisingly, the dt category is spit into two chunks, where it should normally be displayed as a single chunk on the first (innermost) level.

Q: Can someone explain to me why this would happen (some approaches work, some don't and some work but display the data somewhat incorrectly) and what I need to do to visualize the whole data set (more than just the sample data)?

Sample data

dd <- structure(list(paths = c("s", "dt", "so", "sb", "dt-dt", "s-s", 
"so-dt", "dt-dt-dt", "sb-sb", "so-so", "s-s-s", "s-rd", "dt-dt-dt-dt", 
"s-sb", "a", "so-dt-dt", "s-rd-rd", "r", "dt-s", "so-sb", "dt-sb", 
"s-rd-rd-rd", "dt-rd", "dt-dt-dt-dt-dt", "so-dt-dt-dt"), counts = c(4735L, 
4635L, 2191L, 1949L, 1310L, 978L, 558L, 455L, 324L, 281L, 266L, 
231L, 208L, 200L, 200L, 196L, 156L, 150L, 142L, 129L, 123L, 114L, 
113L, 113L, 100L)), .Names = c("paths", "counts"), class = "data.frame", row.names = c(NA, -25L))
like image 828
talat Avatar asked May 30 '16 08:05

talat


1 Answers

dd contains sequences that are subsequences of of others:

tail(dd, 9)
#             paths counts
# 17        s-rd-rd    156 # <-----
# 18              r    150
# 19           dt-s    142
# 20          so-sb    129
# 21          dt-sb    123
# 22     s-rd-rd-rd    114 # <-----
# 23          dt-rd    113
# 24 dt-dt-dt-dt-dt    113
# 25    so-dt-dt-dt    100

E.g. s-rd-rd is part of s-rd-rd-rd. sunburst seems to choke on that. In the package author's example you'll notice an additional -end to prevent such cases. This is also mentioned in the tips here:

each line should be a complete path from root to leaf - don't include counts for intermediate steps. For example, include "home-search-end" and "home-search-product-end" but not "home-search" - the latter is computed by the partition layout, by adding up the counts of all the sequences with that prefix.

This seems to do the trick here, too:

transform(tail(dd, 9), paths=paste0(paths, "-end"))
#                 paths counts
# 17        s-rd-rd-end    156
# 18              r-end    150
# 19           dt-s-end    142
# 20          so-sb-end    129
# 21          dt-sb-end    123
# 22     s-rd-rd-rd-end    114
# 23          dt-rd-end    113
# 24 dt-dt-dt-dt-dt-end    113
# 25    so-dt-dt-dt-end    100

sunburst(transform(tail(dd, 9), paths=paste0(paths, "-end")))

enter image description here

like image 174
lukeA Avatar answered Sep 29 '22 13:09

lukeA