Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do `as` methods remove vector names, and is there a way around it?

Basically, I'm trying to keep a vector named dates of special Dates that come up a lot in my analysis, say New Year's 2016 and July 4 2015. I want to be able to extract from this by name instead of index for robustness, e.g., dates["nyd"] to get New Year's and dates["ind"] to get July 4.

I thought this would be simple:

dates <- as.Date(c(ind = "2015-07-04", nyd = "2016-01-01"))

But as.Date has stripped the names:

dates
# [1] "2015-07-04" "2016-01-01"

It's not like Date vectors can't be named (which would be strange, given they're basically specifically-interpreted integers):

setNames(dates, c("ind", "nyd"))
#          ind          nyd 
# "2015-07-04" "2016-01-01" 

And unfortunately there's no way to declare a Date vector directly (as far as I know?), especially without knowing the underlying integer values of the dates.

Exploring this, it seems this is standard practice for the as* class of functions:

as.integer(c(a = "123", b = "436"))
# [1] 123 436

as(c(a = 1, b = 2), "character")
# [1] "1" "2"

Is there a reason why this is the case? The loss of names isn't mentioned in ?as or any of the other help pages I've seen.

More generally, is there a way (using something other than as*) to ensure the names of an object are not lost in a conversion?

Of course one approach is to write custom functions like as.Date.named or create a custom class as.named with associated methods, but it would be surprising to me if there wasn't something like this already in place, as it seems like this should be a pretty common operation.

In case it matters, I'm on 3.2.2.

like image 837
MichaelChirico Avatar asked Jan 07 '16 05:01

MichaelChirico


People also ask

How do I remove a name from a vector file?

To assign names to the values of vector, we can use names function and the removal of names can be done by using unname function. For example, if we have a vector x that has elements with names and we want to remove the names of those elements then we can use the command unname(x).

What does it mean to name a vector?

Boy. Numerology. 2. Vector is English Boy name and meaning of this name is "Hero".


2 Answers

Indeed there is a discrepancy in the different as.Date methods and here is why (or rather "how"):

First, your example:

> as.Date(c(ind = "2015-07-04", nyd = "2016-01-01"))
[1] "2015-07-04" "2016-01-01"

Here we use method as.Date.character:

> as.Date.character
function (x, format = "", ...) 
{
    charToDate <- function(x) {
        xx <- x[1L]
        if (is.na(xx)) {
            j <- 1L
            while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
            if (is.na(xx)) 
                f <- "%Y-%m-%d"
        }
        if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", 
            tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d", 
            tz = "GMT"))) 
            return(strptime(x, f))
        stop("character string is not in a standard unambiguous format")
    }
    res <- if (missing(format)) 
        charToDate(x)
    else strptime(x, format, tz = "GMT")
    as.Date(res)
}
<bytecode: 0x19d3dff8>
<environment: namespace:base>

Whether the format is given or not, your vector is passed to strptime which converts it to class POSIXlt, and then it is passed to as.Date again but this time with method as.Date.POSIXlt which is:

> as.Date.POSIXlt
function (x, ...) 
.Internal(POSIXlt2Date(x))
<bytecode: 0x19d2df50>
<environment: namespace:base>

meaning that ultimately the function used to convert to class Date is the C function called by POSIXlt2Date (a quick look at file names.c show that the function is do_POSIXlt2D from file datetime.c). For reference, here it is:

SEXP attribute_hidden do_POSIXlt2D(SEXP call, SEXP op, SEXP args, SEXP env)
{
    SEXP x, ans, klass;
    R_xlen_t n = 0, nlen[9];
    stm tm;

    checkArity(op, args);
    PROTECT(x = duplicate(CAR(args)));
    if(!isVectorList(x) || LENGTH(x) < 9)
    error(_("invalid '%s' argument"), "x");

    for(int i = 3; i < 6; i++)
    if((nlen[i] = XLENGTH(VECTOR_ELT(x, i))) > n) n = nlen[i];
    if((nlen[8] = XLENGTH(VECTOR_ELT(x, 8))) > n) n = nlen[8];
    if(n > 0) {
    for(int i = 3; i < 6; i++)
        if(nlen[i] == 0)
        error(_("zero-length component in non-empty \"POSIXlt\" structure"));
    if(nlen[8] == 0)
        error(_("zero-length component in non-empty \"POSIXlt\" structure"));
    }
    /* coerce relevant fields to integer */
    for(int i = 3; i < 6; i++)
    SET_VECTOR_ELT(x, i, coerceVector(VECTOR_ELT(x, i), INTSXP));

    PROTECT(ans = allocVector(REALSXP, n));
    for(R_xlen_t i = 0; i < n; i++) {
    tm.tm_sec = tm.tm_min = tm.tm_hour = 0;
    tm.tm_mday  = INTEGER(VECTOR_ELT(x, 3))[i%nlen[3]];
    tm.tm_mon   = INTEGER(VECTOR_ELT(x, 4))[i%nlen[4]];
    tm.tm_year  = INTEGER(VECTOR_ELT(x, 5))[i%nlen[5]];
    /* mktime ignores tm.tm_wday and tm.tm_yday */
    tm.tm_isdst = 0;
    if(tm.tm_mday == NA_INTEGER || tm.tm_mon == NA_INTEGER ||
       tm.tm_year == NA_INTEGER || validate_tm(&tm) < 0)
        REAL(ans)[i] = NA_REAL;
    else {
        /* -1 must be error as seconds were zeroed */
        double tmp = mktime00(&tm);
        REAL(ans)[i] = (tmp == -1) ? NA_REAL : tmp/86400;
    }
    }

    PROTECT(klass = mkString("Date"));
    classgets(ans, klass);
    UNPROTECT(3);
    return ans;
}

Unfortunately my understanding of C is too limited to know why the attributes are lost here. My guess would be that it happens either during the coerceVector operation or when each element of the POSIXlt list is individually coerced to integers (if that's what happens lines 1268-70).

But let's have a look at the other as.Date method, starting with the main offender, as.Date.POSIXct:

> as.Date.POSIXct
function (x, tz = "UTC", ...) 
{
    if (tz == "UTC") {
        z <- floor(unclass(x)/86400)
        attr(z, "tzone") <- NULL
        structure(z, class = "Date")
    }
    else as.Date(as.POSIXlt(x, tz = tz))
}
<bytecode: 0x19c268bc>
<environment: namespace:base>

With this one, if no timezone is given, or if the timezone is "UTC", the function just manipulate the POSIXct lists to extract the data that can be resolved to a Date object, thus not losing the attributes, but if any other timezones is given, it is then converted to a POSIXlt object and therefore passed further to the same POSIXlt2Date internal, which eventually lose its attributes! And indeed:

> as.Date(c(a = as.POSIXct("2016-01-01")), tz="UTC")
           a 
"2015-12-31" 

> as.Date(c(a = as.POSIXct("2016-01-01")), tz="CET")
[1] "2016-01-01"

And finally, as @Roland mentioned, as.Date.numeric does keep the attributes:

> as.Date.numeric
function (x, origin, ...) 
{
    if (missing(origin)) 
        stop("'origin' must be supplied")
    as.Date(origin, ...) + x
}
<bytecode: 0x568943d4>
<environment: namespace:base>

origin is converted to Date via as.Date.character and then the vector of numeric is added, thus keeping the attributes because of this:

> c(a=1) + 2
a 
3 

So naturally:

> c(a=16814) + as.Date("1970-01-01")
           a 
"2016-01-14"

Until this discrepancy is taken care of, the only solutions you have to keep your attributes, I think, are either to first convert to POSIXct (but beware of timezone issues) or to numeric, or to copy the attributes of your original vector:

> before <- c(ind = "2015-07-04", nyd = "2016-01-01")
> after <- as.Date(before)
> names(after) <- names(before)
> after
         ind          nyd 
"2015-07-04" "2016-01-01" 
like image 82
plannapus Avatar answered Oct 08 '22 14:10

plannapus


This isn't a full answer to the question, but as a way round the problem, no-one has mentioned the mode function.

vec <- c(a = "1", b = "2")
mode(vec) <- "integer"
vec
# returns:
# a b 
# 1 2 

I'm not sure how you'd apply this to dates though:

vec <- c(a = "2010-01-01")
mode(vec) <- "POSIXlt"

gives something, but it doesn't seem quite right.


You could also use

sapply(vec, as.whatever)

which will preserve names. However, I think this will be slower as you lose the advantage of a vectorised function.


Thirdly, there is:

structure(as.whatever(vec), names = names(vec))
like image 2
CJB Avatar answered Oct 08 '22 16:10

CJB