Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid implicit character conversion when using apply on dataframe

Tags:

dataframe

r

apply

When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)

but:

 apply(df, 1, function(y) class(y["t2"]))
 ## [1] "character" "character" "character" "character" "character" "character"
 ## [7] "character" "character" "character" "character"

Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?

edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?

like image 323
ang mo Avatar asked Aug 13 '13 16:08

ang mo


2 Answers

Let's wrap up multiple comments into an explanation.

  1. the use of apply converts a data.frame to a matrix. This means that the least restrictive class will be used. The least restrictive in this case is character.
  2. You're supplying 1 to apply's MARGIN argument. This applies by row and makes you even worse off as you're really mixing classes together now. In this scenario you're using apply designed for matrices and data.frames on a vector. This is not the right tool for the job.
  3. In ths case I'd use lapply or sapply as rmk points out to grab the classes of the single t2 column as seen below:

Code:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))

sapply(df[, "t2"], class)
lapply(df[, "t2"], class)

## [[1]]
## [1] "POSIXct" "POSIXt" 
## 
## [[2]]
## [1] "POSIXct" "POSIXt" 
## 
## [[3]]
## [1] "POSIXct" "POSIXt" 
## 
## .
## .
## . 
## 
## [[9]]
## [1] "POSIXct" "POSIXt" 
## 
## [[10]]
## [1] "POSIXct" "POSIXt" 

In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.

May I offer this blog post as an excellent tutorial on what the different apply family of functions do.

like image 137
Tyler Rinker Avatar answered Nov 09 '22 12:11

Tyler Rinker


Try:

sapply(df, function(y) class(y["t2"]))

$v
[1] "integer"

$t
[1] "integer"

$t2
[1] "POSIXct" "POSIXt"
like image 23
harkmug Avatar answered Nov 09 '22 11:11

harkmug