I have a data.frame:
df<-data.frame(x=c(1,2,3),y=c('b','a','c'))
If I type:
as.character(df[1,])
I get:
"1" "2"
Or if I type:
paste(df[1,],collapse=':')
I get:
1:2
But, if I type:
apply(df,1,as.character) or apply(df,1,paste,collapse=':')
I get:
[1,] "1" "2" "3"
[2,] "b" "a" "c"
and
"1:b" "2:a" "3:c"
I assumed that running the apply function would coerce each row of the df to a vector and then apply the function e.g. as.character() or paste()
However this does not seem to be the case. Can someone please explain what is happening in apply in this situation and why it gives a different answer to the below:
paste(df[1,],collapse=":") then paste(df[2,],collapse=":") then paste(df[3,],collapse=":")
The issue is that the string columns are factor class because while constructing the data.frame, the default option is stringsAsFactors = TRUE and the factor would get coerced to integer storage mode when we do paste across columns. To avoid this behavior use
df <- data.frame(x=c(1,2,3),y=c('b','a','c'), stringsAsFactors = FALSE)
paste(df[1,],collapse=":")
#[1] "1:b"
With apply, it converts to matrix and matrix can have only a single class, therefore it converts the numeric to 'character' when there is a character element based on the precedence of class
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With