I have a numeric variable, DATE
, that represents dates where the last two characters are MONTH
and the first one or two characters are DAY
. I would like to split the column into a separate column for MONTH
and DAY
.
I can do this with the following R
code. Although I was hoping for a simpler regex
solution.
my.data <- read.table(text = '
ID DATE VARX
A111 104 0
A111 204 1
A111 1004 4
A111 2004 4
B111 3004 2
C111 3004 3
C111 105 4
C111 1005 4
', header = TRUE, stringsAsFactors = FALSE)
# remove the last two characters of a string
my.data$DAY <- ifelse(nchar(my.data$DATE) == 3,
substr(my.data$DATE, nchar(my.data$DATE) - (nchar(my.data$DATE)-1), nchar(my.data$DATE) - (nchar(my.data$DATE)-1)),
substr(my.data$DATE, nchar(my.data$DATE) - (nchar(my.data$DATE)-1), nchar(my.data$DATE) - (nchar(my.data$DATE)-2)))
# keep the last two characters of a string
my.data$MONTH <- substr(my.data$DATE, (nchar(my.data$DATE)-1), nchar(my.data$DATE))
ID DATE VARX DAY MONTH
1 A111 104 0 1 04
2 A111 204 1 2 04
3 A111 1004 4 10 04
4 A111 2004 4 20 04
5 B111 3004 2 30 04
6 C111 3004 3 30 04
7 C111 105 4 1 05
8 C111 1005 4 10 05
Thank you for any suggestions.
Here are a few alternatives. The first is the most concise. The first two only use base R.
1) numeric manipulation
transform(my.data, MONTH = DATE %% 100, DAY = DATE %/% 100)
giving:
ID DATE VARX MONTH DAY
1 A111 104 0 4 1
2 A111 204 1 4 2
3 A111 1004 4 4 10
4 A111 2004 4 4 20
5 B111 3004 2 4 30
6 C111 3004 3 4 30
7 C111 105 4 5 1
8 C111 1005 4 5 10
2) sub This gives the same result as in (1).
spl <- function(x, replace) as.numeric(sub("(.*)(..)", replace, x))
transform(my.data, MONTH = spl(DATE, "\\2"), DAY = spl(DATE, "\\1"))
3) strapply applies as.numeric
to the part of the match in parentheses and returns it. This gives the same result as in (1).
library(gsubfn)
spl <- function(x, rx) strapply(x, rx, as.numeric, simplify = TRUE)
transform(my.data, MONTH = spl(DATE, ".*(..)"), DAY = spl(DATE, "(.*).."))
Note They all return numeric columns which seems preferable but if you wanted to change that add as.character(...)
or an appropriate sprintf
in (1), omit as.numeric
in (2) or replace as.numeric
in (3) with c
.
Update Added 2 and 3 and made some improvements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With