Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Riverplot package in R - output plot covered in gridlines or outlines

I've made a Sankey diagram in R Riverplot (v0.5), the output looks OK small in RStudio, but when exported or zoomed in it the colours have dark outlines or gridlines.

The Riverplot image linked here shows the problem

I think it may be because the outlines of the shapes are not matching the transparency I want to use for the fill?

I possibly need to find a way to get rid of outlines altogether (rather than make them semi-transparent), as I think they're also the reason why flows with a value of zero still show up as thin lines.

my code is here:

#loading packages
library(readr)
library("riverplot", lib.loc="C:/Program Files/R/R-3.3.2/library")
library(RColorBrewer)

#loaing data
Cambs_flows <- read_csv("~/RProjects/Cambs_flows4.csv")

#defining the edges
edges = rep(Cambs_flows, col.names = c("N1","N2","Value"))
edges    <- data.frame(edges)
edges$ID <- 1:25

#defining the nodes
nodes <- data.frame(ID = c("Cambridge","S Cambs","Rest of E","Rest of UK","Abroad","to Cambridge","to S Cambs","to Rest of E","to Rest of UK","to Abroad"))
nodes$x = c(1,1,1,1,1,2,2,2,2,2)
nodes$y = c(1,2,3,4,5,1,2,3,4,5)

#picking colours
palette = paste0(brewer.pal(5, "Set1"), "90")

#plot styles
styles = lapply(nodes$y, function(n) {
  list(col = palette[n], lty = 0, textcol = "black")
})

#matching nodes to names
names(styles) = nodes$ID

#defining the river
r <- makeRiver( nodes, edges,
                node_labels = c("Cambridge","S Cambs","Rest of E","Rest of UK","Abroad","to Cambridge","to S Cambs","to Rest of E","to Rest of UK","to Abroad"),
                node_styles = styles)

#Plotting
plot( r, plot_area = 0.9)

And my data is here

dput(Cambs_flows)
structure(list(N1 = c("Cambridge", "Cambridge", "Cambridge", 
"Cambridge", "Cambridge", "S Cambs", "S Cambs", "S Cambs", "S Cambs", 
"S Cambs", "Rest of E", "Rest of E", "Rest of E", "Rest of E", 
"Rest of E", "Rest of UK", "Rest of UK", "Rest of UK", "Rest of UK", 
"Rest of UK", "Abroad", "Abroad", "Abroad", "Abroad", "Abroad"
), N2 = c("to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad"), Value = c(0L, 1616L, 2779L, 13500L, 5670L, 2593L, 
0L, 2975L, 4742L, 1641L, 2555L, 3433L, 0L, 0L, 0L, 6981L, 3802L, 
0L, 0L, 0L, 5670L, 1641L, 0L, 0L, 0L)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -25L), .Names = c("N1", "N2", 
"Value"), spec = structure(list(cols = structure(list(N1 = structure(list(), class = c("collector_character", 
"collector")), N2 = structure(list(), class = c("collector_character", 
"collector")), Value = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("N1", "N2", "Value")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
like image 833
String Avatar asked Dec 11 '16 17:12

String


2 Answers

The culprit is a line in riverplot::curveseg. We can hack this function to fix it, or there is also a very simple workaround that does not require hacking the function. In fact, the simple solution is probably preferably in many cases, but first I explain how to hack the function, so we understand why the workaround also works. Scroll to the end of this answer if you only want the simple solution:

UPDATE: The change suggested below has now been implemented in riverplot version 0.6

To edit the function, you can use

trace(curveseg, edit=T)

Then find the line near the end of the function that reads

polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), c(yy[i], 
      yy[i + 1], yy[i + 1] + w, yy[i] + w), col = grad[i], 
      border = grad[i])

We can see here that the package authors chose not to pass the lty parameter to polygon (UPDATE: see this answer for an explanation of why the package author did it this way). Change this line by adding lty = 0 (or, if you prefer, border = NA) and it works as intended for OPs case. (But note that this may not work well if you wish to render a pdf - see here)

polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), c(yy[i], 
      yy[i + 1], yy[i + 1] + w, yy[i] + w), col = grad[i], 
      border = grad[i], lty=0)

enter image description here

As a side note, this also explains the somewhat odd reported behaviour in the comments that "if you run it twice, the second time the plot looks OK, although export it and the lines come back". When lty is not specified in a call to polygon, the default value it uses is lty = par("lty"). Initially, the default par("lty") is a solid line, but after running the riverplot function once, par("lty") gets set to 0 during a call to riverplot:::draw.nodes thus, suppressing the lines when riverplot is run a 2nd time. But if you then try to export the image, opening a new device resets par("lty") to its default value.

An alternative way to update the function with this edit is to use assignInNamespace to overwrite the package function with your own version. Like this:

curveseg.new = function (x0, x1, y0, y1, width = 1, nsteps = 50, col = "#ffcc0066", 
          grad = NULL, lty = 1, form = c("sin", "line")) 
{
  w <- width
  if (!is.null(grad)) {
    grad <- colorRampPaletteAlpha(grad)(nsteps)
  }
  else {
    grad <- rep(col, nsteps)
  }
  form <- match.arg(form, c("sin", "line"))
  if (form == "sin") {
    xx <- seq(-pi/2, pi/2, length.out = nsteps)
    yy <- y0 + (y1 - y0) * (sin(xx) + 1)/2
    xx <- seq(x0, x1, length.out = nsteps)
  }
  if (form == "line") {
    xx <- seq(x0, x1, length.out = nsteps)
    yy <- seq(y0, y1, length.out = nsteps)
  }
  for (i in 1:(nsteps - 1)) {
    polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), 
            c(yy[i], yy[i + 1], yy[i + 1] + w, yy[i] + w), 
            col = grad[i], border = grad[i], lty=0)
    lines(c(xx[i], xx[i + 1]), c(yy[i], yy[i + 1]), lty = lty)
    lines(c(xx[i], xx[i + 1]), c(yy[i] + w, yy[i + 1] + w), lty = lty)
  }
}

assignInNamespace('curveseg', curveseg.new, 'riverplot', pos = -1, envir = as.environment(pos))

Now for the simple solution, which does not require changes to the function:

Just add the line par(lty=0) before you plot!!!

like image 200
dww Avatar answered Nov 20 '22 13:11

dww


Here is the author of the package. I am now struggling for a satisfactory solution to be included in the next version of the package.

The problem is with how R renders PDFs as compared to bitmaps. In the original version of the package, indeed I passed on lty=0 to polygon() (you can still see it in the commented source code). However, polygon w/o borders looks good only on the png graphics. In the pdf output, thin white lines appear between the polygons. Take a look:

cc <- "#E41A1C90"
plot.new()
rect(0.2, 0.2, 0.4, 0.4, col=cc, border=NA)
rect(0.4, 0.2, 0.6, 0.4, col=cc, border=NA)
dev.copy2pdf(file="riverplot.pdf")

In X or on png, the output is correct. However, if rendered as PDF, you will see a thin white line between the recangles:

enter image description here

When you render a riverplot graphics as PDF like the one above, this looks really bad:

enter image description here

I therefore forced adding borders, however forgot about checking transparency. When no transparency is used, this looks OK -- the borders overlap with the polygons as well as which each other, but you cannot see it. The PDF is now acceptable. However, it messes up the figure if you have transparency.

EDIT:

I have now uploaded version 0.6 of riverplot to CRAN. Besides some new stuff (you can now add riverplot to any part of an existing drawing), by default it uses lty=0 again. However, there is now an option called "fix.pdf" which you can set to TRUE in order to draw the borders around the segments again.

Bottom line, and solutions for now:

  1. Use riverplot 0.6`
  2. If you want to render a PDF, don't use transparency and use fix.pdf=TRUE
  3. If you want to use both transparency and PDF, help me solving the issue.
like image 11
January Avatar answered Nov 20 '22 12:11

January