I am trying to visualize my data flow with a Sankey Diagram in R.
I found this blog post linking to an R script that produces a Sankey Diagram; unfortunately, it's quite raw and somewhat limited (see below for sample code and data).
Does anyone know of other scripts—or maybe even a package—that is more developed? My end goal is to visualize both data flow and percentages by relative size of diagram components, like in these examples of Sankey Diagrams.
I posted a somewhat similar question on the r-help list, but after two weeks without any responses I'm trying my luck here on stackoverflow.
Thanks, Eric
PS. I'm aware of the Parallel Sets Plot, but that is not what I'm looking for.
# thanks to, https://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/ sourc.https <- function(url, ...) { # install and load the RCurl package if (match('RCurl', nomatch=0, installed.packages()[,1])==0) { install.packages(c("RCurl"), dependencies = TRUE) require(RCurl) } else require(RCurl) # parse and evaluate each .R script sapply(c(url, ...), function(u) { eval(parse(text = getURL(u, followlocation = TRUE, cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))), envir = .GlobalEnv) } ) } # from https://gist.github.com/1423501 sourc.https("https://raw.github.com/gist/1423501/55b3c6f11e4918cb6264492528b1ad01c429e581/Sankey.R") # My example (there is another example inside Sankey.R): inputs = c(6, 144) losses = c(6,47,14,7, 7, 35, 34) unit = "n =" labels = c("Transfers", "Referrals\n", "Unable to Engage", "Consultation only", "Did not complete the intake", "Did not engage in Treatment", "Discontinued Mid-Treatment", "Completed Treatment", "Active in \nTreatment") SankeyR(inputs,losses,unit,labels) # Clean up my mess rm("inputs", "labels", "losses", "SankeyR", "sourc.https", "unit")
Sankey Diagram produced with the above code,
A sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links.
Alluvial Diagrams are mainly applied in multi-dimensional data analysis cases. Emphasis mainly focuses on the frequencies and proportions between different dimensions and how they relate to each other. Sankey Diagrams are applied in cases that require quantity visualization between various stages of the entire process.
This plot can be created through the networkD3
package. It allows you to create interactive sankey diagrams. Here you can find an example. I also added a screenshot so you have an idea what it looks like.
# Load package library(networkD3) # Load energy projection data # Load energy projection data URL <- paste0( "https://cdn.rawgit.com/christophergandrud/networkD3/", "master/JSONdata/energy.json") Energy <- jsonlite::fromJSON(URL) # Plot sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source", Target = "target", Value = "value", NodeID = "name", units = "TWh", fontSize = 12, nodeWidth = 30)
I have created a package (riverplot) that has a slightly different, but overlapping functionality compared to the Sankey function, and can produce plots like this one:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With