When using apply
on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
but:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])
?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df)
. Is there any better way to do such things in R?
Let's wrap up multiple comments into an explanation.
apply
converts a data.frame
to a matrix
. This
means that the least restrictive class will be used. The least
restrictive in this case is character. 1
to apply
's MARGIN
argument. This applies
by row and makes you even worse off as you're really mixing classes
together now. In this scenario you're using apply
designed for matrices
and data.frames on a vector. This is not the right tool for the job. lapply
or sapply
as rmk points out to grab the classes of
the single t2 column as seen below:Code:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
In general you choose the apply
family that fits the job. Often I personally use lapply
or a for
loop to act on specific columns or subset the columns I want using indexing ([, ]
) and then proceed with apply
. The answer to this problem really boils down to determining what you want to accomplish, asking is apply
the most appropriate tool, and proceed from there.
May I offer this blog post as an excellent tutorial on what the different apply
family of functions do.
Try:
sapply(df, function(y) class(y["t2"]))
$v
[1] "integer"
$t
[1] "integer"
$t2
[1] "POSIXct" "POSIXt"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With