I am working with thousands of meteorological time series data (Sample data can be downloaded from here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt
Plotting these data using ggplot2 on my Linux Mint PC (64bit, 8GB RAM, Dual-core 2.6 GHz) took a lot of time. I'm wondering if there is a way to speed it up or a better way to plot these data? Thank you very much in advance for any suggestion!
This is the code I'm using for now
##############################################################################
#### load required libraries
library(RCurl)
library(reshape2)
library(dplyr)
library(ggplot2)
##############################################################################
#### Read data from URL
dataURL <- "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
tmp <- getURL(dataURL)
df <- tbl_df(read.table(text = tmp, header = TRUE))
df
##############################################################################
#### Plot time series using ggplot2
# Melt the data by date first
df_melt <- melt(df, id = "date")
str(df_melt)
df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) +
geom_point() +
scale_colour_discrete("Station #") +
xlab("Date") +
ylab("Daily Precipitation [mm]") +
ggtitle("Daily precipitation from 1915 to 2011") +
theme(plot.title = element_text(size = 16, face = "bold", vjust = 2)) + # Change size & distance of the title
theme(axis.text.x = element_text(angle = 0, size = 12, vjust = 0.5)) + # Change size of tick text
theme(axis.text.y = element_text(angle = 0, size = 12, vjust = 0.5)) +
theme( # Move x- & y-axis lables away from the axises
axis.title.x = element_text(size = 14, color = "black", vjust = -0.35),
axis.title.y = element_text(size = 14, color = "black", vjust = 0.35)) +
theme(legend.title = element_text(colour = "chocolate", size = 14, face = "bold")) + # Change Legend text size
guides(colour = guide_legend(override.aes = list(size = 4))) + # Change legend symbol size
guides(fill = guide_legend(ncols = 2))
df_plot
Method 1: Using Basic R methods First, we create a data vector that has data for all the time series that have to be drawn. Then we plot the time series using the first dataset and plot() function. Then add other time series using line() function to the existing plot.
The answer is because ggplot2 is declaratively and efficiently to create data visualization based on The Grammar of Graphics. The layered grammar makes developing charts structural and effusive. Generating ggplot2 feels like playing with LEGO blocks.
The base plotting paradigm is "ink on paper" whereas the lattice and ggplot paradigms are basically writing a program that uses the grid -package to accomplish the low-level output to the target graphics devices.
In this article, we will discuss how to plot Multiple Line Plots or Time Series Plots with the ggplot2 package in the R Programming Language. We can create a line plot using the geom_line () function of the ggplot2 package. Here, is a basic line plot made using the geom_line () function of the ggplot2 package.
We can use the qplot () function in the ggplot2 package to quickly plot a variable such as air temperature ( airt) across all three years of our daily average time series data. # plot air temp qplot (x=date, y=airt, data =harMetDaily. 09. 11, na.rm=TRUE, main = "Air temperature Harvard Forest 2009-2011" , xlab = "Date", ylab= "Temperature (°C)" )
Here, the resulting plot doesn’t look like multiple time series. It is because for a multiple time series in the above example we just used two variables and those two are needed for a single time series plot.
A time series is a graphical plot which represents the series of data points in a specific time order. A time series is a sequence taken with a sequence at a successive equal spaced points of time. Time series can be considered as discrete-time data.
Part of your question asks for a "better way to plot these data".
In that spirit, you seem to have two problems, First, you expect to plot >35,000 points along the x-axis, which, as some of the comments point out, will result in pixel overlap on anything but an extremely large, high resolution monitor. Second, and more important IMO, you are trying to plot 69 time series (stations) on the same plot. In this type of situation a heatmap might be a better approach.
library(data.table)
library(ggplot2)
library(reshape2) # for melt(...)
library(RColorBrewer) # for brewer.pal(...)
url <- "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
dt <- fread(url)
dt[,Year:=year(as.Date(date))]
dt.melt <- melt(dt[,-1,with=F],id="Year",variable.name="Station")
dt.agg <- dt.melt[,list(y=sum(value)),by=list(Year,Station)]
dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))]
ggplot(dt.agg,aes(x=Year,y=Station)) +
geom_tile(aes(fill=y)) +
scale_fill_gradientn("Annual\nPrecip. [mm]",
colours=rev(brewer.pal(9,"Spectral")))+
scale_x_continuous(expand=c(0,0))+
coord_fixed()
Note the use of data.tables
. Your dataset is fairly large (because of all the columns; 35,000 rows is not all that large). In this situation data.tables
will speed up processing substantially, especially fread(...)
which is much faster than the text import functions in base R.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With