Basically, I'm trying to keep a vector named dates
of special Date
s that come up a lot in my analysis, say New Year's 2016 and July 4 2015. I want to be able to extract from this by name instead of index for robustness, e.g., dates["nyd"]
to get New Year's and dates["ind"]
to get July 4.
I thought this would be simple:
dates <- as.Date(c(ind = "2015-07-04", nyd = "2016-01-01"))
But as.Date
has stripped the names:
dates
# [1] "2015-07-04" "2016-01-01"
It's not like Date
vectors can't be named (which would be strange, given they're basically specifically-interpreted integer
s):
setNames(dates, c("ind", "nyd"))
# ind nyd
# "2015-07-04" "2016-01-01"
And unfortunately there's no way to declare a Date
vector directly (as far as I know?), especially without knowing the underlying integer values of the dates.
Exploring this, it seems this is standard practice for the as*
class of functions:
as.integer(c(a = "123", b = "436"))
# [1] 123 436
as(c(a = 1, b = 2), "character")
# [1] "1" "2"
Is there a reason why this is the case? The loss of names isn't mentioned in ?as
or any of the other help pages I've seen.
More generally, is there a way (using something other than as*
) to ensure the names of an object are not lost in a conversion?
Of course one approach is to write custom functions like as.Date.named
or create a custom class as.named
with associated methods, but it would be surprising to me if there wasn't something like this already in place, as it seems like this should be a pretty common operation.
In case it matters, I'm on 3.2.2.
To assign names to the values of vector, we can use names function and the removal of names can be done by using unname function. For example, if we have a vector x that has elements with names and we want to remove the names of those elements then we can use the command unname(x).
Boy. Numerology. 2. Vector is English Boy name and meaning of this name is "Hero".
Indeed there is a discrepancy in the different as.Date
methods and here is why (or rather "how"):
First, your example:
> as.Date(c(ind = "2015-07-04", nyd = "2016-01-01"))
[1] "2015-07-04" "2016-01-01"
Here we use method as.Date.character
:
> as.Date.character
function (x, format = "", ...)
{
charToDate <- function(x) {
xx <- x[1L]
if (is.na(xx)) {
j <- 1L
while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
if (is.na(xx))
f <- "%Y-%m-%d"
}
if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d",
tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d",
tz = "GMT")))
return(strptime(x, f))
stop("character string is not in a standard unambiguous format")
}
res <- if (missing(format))
charToDate(x)
else strptime(x, format, tz = "GMT")
as.Date(res)
}
<bytecode: 0x19d3dff8>
<environment: namespace:base>
Whether the format is given or not, your vector is passed to strptime
which converts it to class POSIXlt, and then it is passed to as.Date
again but this time with method as.Date.POSIXlt
which is:
> as.Date.POSIXlt
function (x, ...)
.Internal(POSIXlt2Date(x))
<bytecode: 0x19d2df50>
<environment: namespace:base>
meaning that ultimately the function used to convert to class Date is the C function called by POSIXlt2Date
(a quick look at file names.c
show that the function is do_POSIXlt2D
from file datetime.c
). For reference, here it is:
SEXP attribute_hidden do_POSIXlt2D(SEXP call, SEXP op, SEXP args, SEXP env)
{
SEXP x, ans, klass;
R_xlen_t n = 0, nlen[9];
stm tm;
checkArity(op, args);
PROTECT(x = duplicate(CAR(args)));
if(!isVectorList(x) || LENGTH(x) < 9)
error(_("invalid '%s' argument"), "x");
for(int i = 3; i < 6; i++)
if((nlen[i] = XLENGTH(VECTOR_ELT(x, i))) > n) n = nlen[i];
if((nlen[8] = XLENGTH(VECTOR_ELT(x, 8))) > n) n = nlen[8];
if(n > 0) {
for(int i = 3; i < 6; i++)
if(nlen[i] == 0)
error(_("zero-length component in non-empty \"POSIXlt\" structure"));
if(nlen[8] == 0)
error(_("zero-length component in non-empty \"POSIXlt\" structure"));
}
/* coerce relevant fields to integer */
for(int i = 3; i < 6; i++)
SET_VECTOR_ELT(x, i, coerceVector(VECTOR_ELT(x, i), INTSXP));
PROTECT(ans = allocVector(REALSXP, n));
for(R_xlen_t i = 0; i < n; i++) {
tm.tm_sec = tm.tm_min = tm.tm_hour = 0;
tm.tm_mday = INTEGER(VECTOR_ELT(x, 3))[i%nlen[3]];
tm.tm_mon = INTEGER(VECTOR_ELT(x, 4))[i%nlen[4]];
tm.tm_year = INTEGER(VECTOR_ELT(x, 5))[i%nlen[5]];
/* mktime ignores tm.tm_wday and tm.tm_yday */
tm.tm_isdst = 0;
if(tm.tm_mday == NA_INTEGER || tm.tm_mon == NA_INTEGER ||
tm.tm_year == NA_INTEGER || validate_tm(&tm) < 0)
REAL(ans)[i] = NA_REAL;
else {
/* -1 must be error as seconds were zeroed */
double tmp = mktime00(&tm);
REAL(ans)[i] = (tmp == -1) ? NA_REAL : tmp/86400;
}
}
PROTECT(klass = mkString("Date"));
classgets(ans, klass);
UNPROTECT(3);
return ans;
}
Unfortunately my understanding of C is too limited to know why the attributes are lost here. My guess would be that it happens either during the coerceVector
operation or when each element of the POSIXlt list is individually coerced to integers (if that's what happens lines 1268-70).
But let's have a look at the other as.Date
method, starting with the main offender, as.Date.POSIXct
:
> as.Date.POSIXct
function (x, tz = "UTC", ...)
{
if (tz == "UTC") {
z <- floor(unclass(x)/86400)
attr(z, "tzone") <- NULL
structure(z, class = "Date")
}
else as.Date(as.POSIXlt(x, tz = tz))
}
<bytecode: 0x19c268bc>
<environment: namespace:base>
With this one, if no timezone is given, or if the timezone is "UTC", the function just manipulate the POSIXct
lists to extract the data that can be resolved to a Date object, thus not losing the attributes, but if any other timezones is given, it is then converted to a POSIXlt
object and therefore passed further to the same POSIXlt2Date
internal, which eventually lose its attributes! And indeed:
> as.Date(c(a = as.POSIXct("2016-01-01")), tz="UTC")
a
"2015-12-31"
> as.Date(c(a = as.POSIXct("2016-01-01")), tz="CET")
[1] "2016-01-01"
And finally, as @Roland mentioned, as.Date.numeric
does keep the attributes:
> as.Date.numeric
function (x, origin, ...)
{
if (missing(origin))
stop("'origin' must be supplied")
as.Date(origin, ...) + x
}
<bytecode: 0x568943d4>
<environment: namespace:base>
origin
is converted to Date via as.Date.character
and then the vector of numeric is added, thus keeping the attributes because of this:
> c(a=1) + 2
a
3
So naturally:
> c(a=16814) + as.Date("1970-01-01")
a
"2016-01-14"
Until this discrepancy is taken care of, the only solutions you have to keep your attributes, I think, are either to first convert to POSIXct (but beware of timezone issues) or to numeric, or to copy the attributes of your original vector:
> before <- c(ind = "2015-07-04", nyd = "2016-01-01")
> after <- as.Date(before)
> names(after) <- names(before)
> after
ind nyd
"2015-07-04" "2016-01-01"
This isn't a full answer to the question, but as a way round the problem, no-one has mentioned the mode
function.
vec <- c(a = "1", b = "2")
mode(vec) <- "integer"
vec
# returns:
# a b
# 1 2
I'm not sure how you'd apply this to dates though:
vec <- c(a = "2010-01-01")
mode(vec) <- "POSIXlt"
gives something, but it doesn't seem quite right.
You could also use
sapply(vec, as.whatever)
which will preserve names. However, I think this will be slower as you lose the advantage of a vectorised function.
Thirdly, there is:
structure(as.whatever(vec), names = names(vec))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With