Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a binary sparkline plot in R with ggplot2 barplot

Tags:

r

ggplot2

I have a year's worth of data that looks like this:

datetime, key, value
1/1/15, 7k Steps, 1
1/1/15, Ate Poorly, 1
1/1/15, Audiobook, 1
1/1/15, Befriend, 1
1/1/15, Called Mom, 1
1/1/15, Code, 1
1/1/15, Create, 1
1/1/15, Critical, 1
1/1/15, Emailed Friend, 1
1/2/15, 10k Steps, 1
1/2/15, Ate Poorly, 1
1/2/15, Audiobook, 1
1/2/15, Befriend, 1
1/2/15, Called Mom, 1
1/2/15, Create, 1
1/2/15, Emailed Friend, 1
1/2/15, Exercise, 1
1/2/15, Friend Contact, 1
1/2/15, Great Day, 1
1/2/15, Write, 1
1/3/15, 7k Steps, 1
1/3/15, Ate Poorly, 1
1/3/15, Befriend, 1
1/3/15, Create, 1
1/3/15, Emailed Friend, 1
1/3/15, Friend Contact, 1
1/3/15, Great Day, 1
1/3/15, Happiness, 1
1/3/15, Health, 1
1/3/15, Videogame, 1
1/3/15, Walked With Michelle, 1
1/3/15, Write, 1
1/4/15, 7k Steps, 1
1/4/15, Ate Poorly, 1
1/4/15, Audiobook, 1
1/4/15, Great Day, 1
1/4/15, Happiness, 1
1/4/15, Health, 1
1/4/15, Impatient, 1
1/4/15, Love, 1
1/4/15, Movie With Michelle, 1

I want to create a plot that displays one row for each key with bars for each day that has a 1 for that key. Here's an example of the output I desire:

enter image description here

That's one I had painfully rendered using Python and Matplotlib.

I'm looking for the best and simplest way to render a plot like this in R with, perhaps, ggplot2. I had planned on using a bar plot in ggplot2 with a loop for each key. Here's an example of my code:

library(ggplot2)
library(reshape)
#library(ggtheme)
# 2015 Lifedata Processing
d <- read.csv("lifedata_2015.csv")
d$datetime <- as.Date(d$datetime, "%m/%d/%Y")

# Create a new dataframe with a subset of keys
r <- d[d$key %in% c("Read", "Audiobook"), ]
# Put 1s in all values.
r$value <- 1

# Generate a data frame for each day with a value of 1 and a key of "alldates"
mydates <- data.frame("datetime" = seq(as.Date("2015/1/1"), as.Date("2015/12/31"), "days"), "key" = "alldates", "value" = 1)

# combine two data frames, one after the other
n <- rbind(r, mydates)

# Transform into a wide data frame based on datetime and key with mean as the value.
c <- cast(n, datetime~key, mean)

# Turn NaNs into 0
c[is.na(c)] = 0
for(name in c("Read", "Audiobook")){
  plt <- c(plt, ggplot(data=c, aes_string(x="datetime", y=name)) + 
    geom_bar(stat="Identity", width=1))
  print(plot)
}
svg("~/Desktop/tagplot.svg")
grid.arrange(plt, ncol = 1, main = "Read")
dev.off()

This technique didn't seem to work.

What is a better way to plot event data like I have above in the example?

like image 530
Mike Shea Avatar asked Jan 17 '16 03:01

Mike Shea


2 Answers

Here is an alternative approach, heavily borrowing from @TylerRinker's answer. As far as I can tell, his answer only shows something if that activity was performed two days in a row.

Setup

library(dplyr)
library(ggplot2)

First, we borrow these pieces from Tyler. We need nice labels.

d <- d %>%
  mutate(datetime = as.Date(datetime, "%m/%d/%y"))

key <- d %>%
  group_by(key) %>%
  summarize(n = length(datetime), perc = n/length(unique(d$datetime))) %>%
  arrange(perc) %>%
  mutate(
    new = paste0(key, " - ", n, "(", 100*perc, "%)"),
    new = factor(new, levels = new)
  )

Instead of geom_line we use geom_tile to get a filled rectangle for each day with a value of 1, missing days remain empty. We use geom_hline to create some separation in the y direction.

Plot code

left_join(d, key) %>%
  ggplot(aes(datetime, y = new)) +
  geom_tile(show.legend = FALSE, fill = 'grey50') +
  geom_hline(yintercept = seq(0.5, length(levels(d$key))), 
             color = 'white', size = 2) +
  theme_classic() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b", expand = c(0, 0)) +
  ylab(NULL) +
  xlab(NULL)

Result

enter image description here

like image 160
Axeman Avatar answered Nov 20 '22 13:11

Axeman


Here's a decent start but some of the smaller details will need to be worked out:

library(ggplot2)
library(tidyr)
library(dplyr)

d <- d %>%
    mutate(datetime = as.Date(datetime, "%m/%d/%y")) 


key <- d %>%
    group_by(key) %>%
    summarize(
        n = length(datetime),
        perc = n/length(unique(d$datetime))
    ) %>%
    arrange(perc) %>%
    mutate(
        new = paste0(key, " - ", n, "(", 100*perc, "%)"),
        new = factor(new, levels = new)
    ) 

left_join(d, key) %>% 
    ggplot(aes(datetime, y = new)) +
        geom_line(size = 6, alpha=.3) +
        theme_minimal() + 
        scale_x_date(date_breaks = "1 month", date_labels = "%b", expand = c(0, 0)) +
        ylab(NULL) + 
        xlab(NULL)

enter image description here

like image 41
Tyler Rinker Avatar answered Nov 20 '22 13:11

Tyler Rinker