Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting hex bins in ggplot2 to same size

I'm trying to make a hexbin representation of data in several categories. The problem is, facetting these bins seems to make all of them different sizes.

set.seed(1) #Create data
bindata <- data.frame(x=rnorm(100), y=rnorm(100))
fac_probs <- dnorm(seq(-3, 3, length.out=26))
fac_probs <- fac_probs/sum(fac_probs)
bindata$factor <- sample(letters, 100, replace=TRUE, prob=fac_probs)

library(ggplot2) #Actual plotting
library(hexbin)

ggplot(bindata, aes(x=x, y=y)) +
  geom_hex() +
  facet_wrap(~factor)

enter image description here

Is it possible to set something to make all these bins physically the same size?

like image 211
sebastian-c Avatar asked Jan 24 '13 06:01

sebastian-c


1 Answers

As Julius says, the problem is that hexGrob doesn't get the information about the bin sizes, and guesses it from the differences it finds within the facet.

Obviously, it would make sense to hand dx and dy to a hexGrob -- not having the width and height of a hexagon is like specifying a circle by center without giving the radius.

Workaround:

workaround

The resolution strategy works, if the facet contains two adjacent haxagons that differ in both x and y. So, as a workaround, I'll construct manually a data.frame containing the x and y center coordinates of the cells, and the factor for facetting and the counts:

In addition to the libraries specified in the question, I'll need

library (reshape2)

and also bindata$factor actually needs to be a factor:

bindata$factor <- as.factor (bindata$factor)

Now, calculate the basic hexagon grid

h <- hexbin (bindata, xbins = 5, IDs = TRUE, 
             xbnds = range (bindata$x), 
             ybnds = range (bindata$y))

Next, we need to calculate the counts depending on bindata$factor

counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts)  <- c ("ID", "factor", "counts")

As we have the cell IDs, we can merge this data.frame with the proper coordinates:

hexdf <- data.frame (hcell2xy (h),  ID = h@cell)
hexdf <- merge (counts, hexdf)

Here's what the data.frame looks like:

> head (hexdf)
  ID factor counts          x         y
1  3      e      0 -0.3681728 -1.914359
2  3      s      0 -0.3681728 -1.914359
3  3      y      0 -0.3681728 -1.914359
4  3      r      0 -0.3681728 -1.914359
5  3      p      0 -0.3681728 -1.914359
6  3      o      0 -0.3681728 -1.914359

ggplotting (use the command below) this yields the correct bin sizes, but the figure has a bit weird appearance: 0 count hexagons are drawn, but only where some other facet has this bin populated. To suppres the drawing, we can set the counts there to NA and make the na.value completely transparent (it defaults to grey50):

hexdf$counts [hexdf$counts == 0] <- NA

ggplot(hexdf, aes(x=x, y=y, fill = counts)) +
  geom_hex(stat="identity") +
  facet_wrap(~factor) +
  coord_equal () +
  scale_fill_continuous (low = "grey80", high = "#000040", na.value = "#00000000")

yields the figure at the top of the post.

This strategy works as long as the binwidths are correct without facetting. If the binwidths are set very small, the resolution may still yield too large dx and dy. In that case, we can supply hexGrob with two adjacent bins (but differing in both x and y) with NA counts for each facet.

dummy <- hgridcent (xbins = 5, 
                    xbnds = range (bindata$x),  
                    ybnds = range (bindata$y),  
                    shape = 1)

dummy <- data.frame (ID = 0,
                     factor = rep (levels (bindata$factor), each = 2),
                     counts = NA,
                     x = rep (dummy$x [1] + c (0, dummy$dx/2), 
                              nlevels (bindata$factor)),
                     y = rep (dummy$y [1] + c (0, dummy$dy  ), 
                              nlevels (bindata$factor)))

An additional advantage of this approach is that we can delete all the rows with 0 counts already in counts, in this case reducing the size of hexdf by roughly 3/4 (122 rows instead of 520):

counts <- counts [counts$counts > 0 ,]
hexdf <- data.frame (hcell2xy (h),  ID = h@cell)
hexdf <- merge (counts, hexdf)
hexdf <- rbind (hexdf, dummy)

The plot looks exactly the same as above, but you can visualize the difference with na.value not being fully transparent.


more about the problem

The problem is not unique to facetting but occurs always if too few bins are occupied, so that no "diagonally" adjacent bins are populated.

Here's a series of more minimal data that shows the problem:

First, I trace hexBin so I get all center coordinates of the same hexagonal grid that ggplot2:::hexBin and the object returned by hexbin:

trace (ggplot2:::hexBin, exit = quote ({trace.grid <<- as.data.frame (hgridcent (xbins = xbins, xbnds = xbnds, ybnds = ybnds, shape = ybins/xbins) [1:2]); trace.h <<- hb}))

Set up a very small data set:

df <- data.frame (x = 3 : 1, y = 1 : 3)

And plot:

p <- ggplot(df, aes(x=x, y=y)) +  geom_hex(binwidth=c(1, 1)) +          
     coord_fixed (xlim = c (0, 4), ylim = c (0,4))

p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) + 
    geom_point (data = df, col = "red") # data pts

str (trace.h)

Formal class 'hexbin' [package "hexbin"] with 16 slots
  ..@ cell  : int [1:3] 3 5 7
  ..@ count : int [1:3] 1 1 1
  ..@ xcm   : num [1:3] 3 2 1
  ..@ ycm   : num [1:3] 1 2 3
  ..@ xbins : num 2
  ..@ shape : num 1
  ..@ xbnds : num [1:2] 1 3
  ..@ ybnds : num [1:2] 1 3
  ..@ dimen : num [1:2] 4 3
  ..@ n     : int 3
  ..@ ncells: int 3
  ..@ call  : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
  ..@ xlab  : chr "x"
  ..@ ylab  : chr "y"
  ..@ cID   : NULL
  ..@ cAtt  : int(0) 

I repeat the plot, leaving out data point 2:

p <- ggplot(df [-2,], aes(x=x, y=y)) +  geom_hex(binwidth=c(1, 1)) +          coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p
p + geom_point (data = trace.grid, size = 4) + geom_point (data = df, col = "red")
str (trace.h)

Formal class 'hexbin' [package "hexbin"] with 16 slots
  ..@ cell  : int [1:2] 3 7
  ..@ count : int [1:2] 1 1
  ..@ xcm   : num [1:2] 3 1
  ..@ ycm   : num [1:2] 1 3
  ..@ xbins : num 2
  ..@ shape : num 1
  ..@ xbnds : num [1:2] 1 3
  ..@ ybnds : num [1:2] 1 3
  ..@ dimen : num [1:2] 4 3
  ..@ n     : int 2
  ..@ ncells: int 2
  ..@ call  : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
  ..@ xlab  : chr "x"
  ..@ ylab  : chr "y"
  ..@ cID   : NULL
  ..@ cAtt  : int(0) 

everything finehexagon plotting messed up

  • note that the results from hexbin are on the same grid (cell numbers did not change, just cell 5 is not populated any more and thus not listed), grid dimensions and ranges did not change. But the plotted hexagons did change dramatically.

  • Also notice that hgridcent forgets to return the center coordinates of the first cell (lower left).

Though it gets populated:

df <- data.frame (x = 1 : 3, y = 1 : 3)

p <- ggplot(df, aes(x=x, y=y)) +  geom_hex(binwidth=c(0.5, 0.8)) +          
     coord_fixed (xlim = c (0, 4), ylim = c (0,4))

p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) + 
    geom_point (data = df, col = "red") + # data pts
    geom_point (data = as.data.frame (hcell2xy (trace.h)), shape = 1, size = 6)

all messed up

Here, the rendering of the hexagons cannot possibly be correct - they do not belong to one hexagonal grid.

like image 163
cbeleites unhappy with SX Avatar answered Oct 01 '22 09:10

cbeleites unhappy with SX