As seen in the sample dataset below, dates are between 2015-01-01 and 2015-08-01. However when I use the date_breaks
function, the month labels in x-axis are shifted. Could you suggest a solution?
dfn <- read.table(header=T, text='
supp p_date length
OJ 2015-01-01 13.23
OJ 2015-03-01 22.70
OJ 2015-08-01 26.06
VC 2015-01-01 7.98
VC 2015-03-01 16.77
VC 2015-08-01 26.14
')
dfn$p_date <- as.Date(dfn$p_date, "%Y-%m-%d")
library(ggplot2)
library(scales)
ggplot(dfn, aes(as.POSIXct(p_date), length, colour = factor(supp))) +
geom_line(size=1.3) +
labs(colour="Lines :", x = "", y = "") +
guides(colour = guide_legend(override.aes = list(size=5))) +
scale_x_datetime(breaks = date_breaks("1 months"),labels = date_format("%m/%y"))
Here is my sessionInfo
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Greek_Greece.1253 LC_CTYPE=Greek_Greece.1253 LC_MONETARY=Greek_Greece.1253 LC_NUMERIC=C LC_TIME=Greek_Greece.1253
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_0.3.0 ggplot2_1.0.1
loaded via a namespace (and not attached):
[1] labeling_0.3 MASS_7.3-44 colorspace_1.2-6 magrittr_1.5 plyr_1.8.3 tools_3.2.2 gtable_0.1.2 reshape2_1.4.1
[9] Rcpp_0.12.1 stringi_0.5-5 grid_3.2.2 stringr_1.0.0 digest_0.6.8 proto_0.3-10 munsell_0.4.2
I noticed similar (but not identical) issues with the date format, that appears to be down to the use of timezones in the use of the format
used in date_format()
.
Your base plot
library(ggplot2)
library(scales)
p <- ggplot(dfn, aes(as.POSIXct(p_date), length, colour = factor(supp))) +
geom_point(size=3) +
geom_line(size=1.3) +
labs(colour="Lines :", x = "", y = "") +
guides(colour = guide_legend(override.aes = list(size=5))) +
theme(axis.text.x=element_text(size=15))
p + scale_x_datetime(breaks = date_breaks("1 month"),
labels = date_format("%m/%Y"))
Which produces
Note that March has been added twice, and this appears to be due to using timezone information within the format function
date_format()(date_breaks()(as.POSIXct(dfn$p_date[c(1,3)])))
#[1] "2015-01-01" "2015-02-01" "2015-03-01" "2015-03-31" "2015-04-30" "2015-05-31"
#[7] "2015-06-30" "2015-07-31" "2015-08-31"
You can write your own format function so that it does not include any timezone information.
my_format <- function (format = "%Y-%m-%d") {
function(x) format(x, format)
}
And plot
p + scale_x_datetime(breaks = date_breaks("1 month"),
labels = my_format("%m/%Y"))
This format behaves more like i would expect
my_format()(date_breaks()(as.POSIXct(dfn$date2[c(1,3)])))
#[1] "2015-01-01" "2015-02-01" "2015-03-01" "2015-04-01" "2015-05-01" "2015-06-01"
#[7] "2015-07-01" "2015-08-01" "2015-09-01"
So not really an explanation, as my understanding of POSIX functions is minimal, but does suggest a work around.
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] datasets utils stats graphics grDevices grid methods base
other attached packages:
[1] scales_0.3.0 data.table_1.9.6 Hmisc_3.17-0 Formula_1.2-1 survival_2.38-3
[6] lattice_0.20-33 MASS_7.3-44 gridExtra_2.1.0 ggplot2_1.0.1.9003
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 digest_0.6.8 chron_2.3-47 plyr_1.8.3 gtable_0.1.2
[6] acepack_1.3-3.3 latticeExtra_0.6-26 rpart_4.1-10 labeling_0.3 proto_0.3-10
[11] splines_3.2.2 RColorBrewer_1.1-2 tools_3.2.2 foreign_0.8-66 munsell_0.4.2
[16] colorspace_1.2-6 cluster_2.0.3 nnet_7.3-11
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With