Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically add column to xts object

Tags:

r

xts

Adding a column to an xts object is straightforward if you know the name of the column ahead of time. For example, to add a column named "b":

n <- 5
x <- merge(xts(order.by = as.Date('2015-1-1') + 1:n), a = rnorm(n))
x$b <- rnorm(n)

Adding a dynamically-named column (i.e., a column whose name is known only at runtime) is harder:

new.col.name <- 'c' # known only at runtime
x[, new.col.name] <- rnorm(n) # this generates an error

One approach is to add a column with a temporary name and then rename it:

stopifnot(!('tmp' %in% names(x)))
x$tmp <- rnorm(n)
names(x)[names(x) == 'tmp'] <- new.col.name

Is there a better way to do this? (Also, does assigning to names of an xts object result in a copy of the object being made? So, for example, would the above approach work well if n were very large?)

like image 654
banbh Avatar asked Oct 09 '15 13:10

banbh


People also ask

How to apply a function to an XTS object?

As xts objects are arrays, getting apply functions to work is a little tricky if you want to preserve the dates. For example, take the xts object xx below: Say we wish to apply a function to each column (to keep it simple say i wish to add 100 to each element of each column). Doing with with sapply loses the row names.

How to copy XZ values from XZ to XTS?

The function " [<-" copies the xz object and replaces all its values. Alternatively, you can create a new xts object based on the matrix returned by vapply and the time information in your original xts object. As you can see, the approach with vapply and " [<-" is the fastest one.

Does vapply preserve the dates of the XTS object in Excel?

Since all columns are numeric, the function returns a numeric vector of length nrow (xz) for each column of xz. Unfortunately, vapply does not preserve the dates of the xts object. You can use the following command to generate a new object based on xz and replace all values with the matrix returned by vapply.

What is the fastest way to apply a function to columns?

As you can see, the approach with vapply and " [<-" is the fastest one. An important information: if the function you want to apply to each column is a mathematical operation, you can apply it to the whole xts object at once, e.g., xz + 100.


1 Answers

The easiest/clearest thing to do is merge the original object with the new column(s), after you convert the new column(s) to a matrix (so you can set the column name).

set.seed(21)
newData <- rnorm(n)
x1 <- merge(x, matrix(newData, ncol=1, dimnames=list(NULL, new.col.name)))
# another way to do the same thing
dim(newData) <- c(nrow(x), 1)
colnames(newData) <- new.col.name
x2 <- merge(x, newData)

To answer your second question: yes, assigning names (and colnames) on an xts object creates a copy. You can see it does by using tracemem and the output from gc.

> R -q  # new R session
R> x <- xts::.xts(1:1e6, 1:1e6)
R> tracemem(x)
[1] "<0x2892400>"
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  259260 13.9     592000 31.7   350000 18.7
Vcells 1445207 11.1    4403055 33.6  3445276 26.3
R> colnames(x) <- "hi"
tracemem[0x2892400 -> 0x24c1ad0]: 
tracemem[0x24c1ad0 -> 0x2c62d30]: colnames<- 
tracemem[0x2c62d30 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- 
tracemem[0x3033660 -> 0x3403f90]: dimnames<-.xts dimnames<- colnames<- 
tracemem[0x3403f90 -> 0x37d48c0]: colnames<- dimnames<-.xts dimnames<- colnames<- 
tracemem[0x37d48c0 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- 
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  259696 13.9     592000 31.7   350000 18.7
Vcells 1445750 11.1    4403055 33.6  3949359 30.2
R> print(object.size(x), units="Mb")
7.6 Mb

You can see the colnames<- call causes ~4MB of extra memory to be used (the "max used (Mb)" increased by that amount). The entire xts object is ~8MB, half of which is the coredata and the other half is the index. So the 4MB of extra memory used is to copy the coredata.

If you want to avoid the copy, you can set it manually. But be careful, because you could do something that would otherwise be caught by the "checks" in colnames<-.xts.

> R -q  # new R session
R> x <- xts::.xts(1:1e6, 1:1e6)
R> tracemem(x)
[1] "<0x2cc5330>"
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  256397 13.7     592000 31.7   350000 18.7
Vcells 1440915 11.0    4397699 33.6  3441761 26.3
R> attr(x, 'dimnames') <- list(NULL, "hi")
tracemem[0x2cc5330 -> 0x28f4a00]: 
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  256403 13.7     592000 31.7   350000 18.7
Vcells 1440916 11.0    4397699 33.6  3441761 26.3
R> print(object.size(x), units="Mb")
7.6 Mb
like image 187
Joshua Ulrich Avatar answered Oct 20 '22 23:10

Joshua Ulrich