Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting large number of time series using ggplot. Is it possible to speed up?

I am working with thousands of meteorological time series data (Sample data can be downloaded from here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt

Plotting these data using ggplot2 on my Linux Mint PC (64bit, 8GB RAM, Dual-core 2.6 GHz) took a lot of time. I'm wondering if there is a way to speed it up or a better way to plot these data? Thank you very much in advance for any suggestion!

This is the code I'm using for now

##############################################################################
#### load required libraries
library(RCurl)
library(reshape2)
library(dplyr)
library(ggplot2)

##############################################################################
#### Read data from URL
dataURL <- "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
tmp <- getURL(dataURL)
df <- tbl_df(read.table(text = tmp, header = TRUE))
df

##############################################################################
#### Plot time series using ggplot2
# Melt the data by date first
df_melt <- melt(df, id = "date")
str(df_melt)

df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) +
  geom_point() +
  scale_colour_discrete("Station #") +
  xlab("Date") +
  ylab("Daily Precipitation [mm]") +
  ggtitle("Daily precipitation from 1915 to 2011") +
  theme(plot.title = element_text(size = 16, face = "bold", vjust = 2)) + # Change size & distance of the title
  theme(axis.text.x = element_text(angle = 0, size = 12, vjust = 0.5)) + # Change size of tick text
  theme(axis.text.y = element_text(angle = 0, size = 12, vjust = 0.5)) +
  theme( # Move x- & y-axis lables away from the axises
    axis.title.x = element_text(size = 14, color = "black", vjust = -0.35),
    axis.title.y = element_text(size = 14, color = "black", vjust = 0.35)) +
  theme(legend.title = element_text(colour = "chocolate", size = 14, face = "bold")) + # Change Legend text size
  guides(colour = guide_legend(override.aes = list(size = 4))) + # Change legend symbol size
  guides(fill = guide_legend(ncols = 2))
df_plot
like image 621
Tung Avatar asked Aug 12 '14 20:08

Tung


People also ask

How do you plot multiple times in R?

Method 1: Using Basic R methods First, we create a data vector that has data for all the time series that have to be drawn. Then we plot the time series using the first dataset and plot() function. Then add other time series using line() function to the existing plot.

Why is Ggplot so good?

The answer is because ggplot2 is declaratively and efficiently to create data visualization based on The Grammar of Graphics. The layered grammar makes developing charts structural and effusive. Generating ggplot2 feels like playing with LEGO blocks.

What is the difference between Ggplot and plot?

The base plotting paradigm is "ink on paper" whereas the lattice and ggplot paradigms are basically writing a program that uses the grid -package to accomplish the low-level output to the target graphics devices.

How to plot multiple line plots or time series plots with ggplot2?

In this article, we will discuss how to plot Multiple Line Plots or Time Series Plots with the ggplot2 package in the R Programming Language. We can create a line plot using the geom_line () function of the ggplot2 package. Here, is a basic line plot made using the geom_line () function of the ggplot2 package.

How to plot air temperature across all three years in ggplot2?

We can use the qplot () function in the ggplot2 package to quickly plot a variable such as air temperature ( airt) across all three years of our daily average time series data. # plot air temp qplot (x=date, y=airt, data =harMetDaily. 09. 11, na.rm=TRUE, main = "Air temperature Harvard Forest 2009-2011" , xlab = "Date", ylab= "Temperature (°C)" )

Why doesn’t the resulting plot look like multiple time series?

Here, the resulting plot doesn’t look like multiple time series. It is because for a multiple time series in the above example we just used two variables and those two are needed for a single time series plot.

What is a time series plot?

A time series is a graphical plot which represents the series of data points in a specific time order. A time series is a sequence taken with a sequence at a successive equal spaced points of time. Time series can be considered as discrete-time data.


1 Answers

Part of your question asks for a "better way to plot these data".

In that spirit, you seem to have two problems, First, you expect to plot >35,000 points along the x-axis, which, as some of the comments point out, will result in pixel overlap on anything but an extremely large, high resolution monitor. Second, and more important IMO, you are trying to plot 69 time series (stations) on the same plot. In this type of situation a heatmap might be a better approach.

library(data.table)
library(ggplot2)
library(reshape2)          # for melt(...)
library(RColorBrewer)      # for brewer.pal(...)
url <-  "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
dt  <- fread(url)
dt[,Year:=year(as.Date(date))]

dt.melt  <- melt(dt[,-1,with=F],id="Year",variable.name="Station")
dt.agg   <- dt.melt[,list(y=sum(value)),by=list(Year,Station)]
dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))]
ggplot(dt.agg,aes(x=Year,y=Station)) + 
  geom_tile(aes(fill=y)) +
  scale_fill_gradientn("Annual\nPrecip. [mm]",
                       colours=rev(brewer.pal(9,"Spectral")))+
  scale_x_continuous(expand=c(0,0))+
  coord_fixed()

Note the use of data.tables. Your dataset is fairly large (because of all the columns; 35,000 rows is not all that large). In this situation data.tables will speed up processing substantially, especially fread(...) which is much faster than the text import functions in base R.

like image 125
jlhoward Avatar answered Sep 24 '22 00:09

jlhoward