If you use apply over rows on a data.frame with character and numeric columns, apply uses as.matrix internally to convert the data.frame to only characters. But if the numeric column consists of numbers of different lengths, as.matrix adds spaces to match the highest/"longest" number.
An example:
df <- data.frame(id1=c(rep("a",3)),id2=c(100,90,8), stringsAsFactors = FALSE) df ## id1 id2 ## 1 a 100 ## 2 a 90 ## 3 a 8 as.matrix(df) ## id1 id2 ## [1,] "a" "100" ## [2,] "a" " 90" ## [3,] "a" " 8"
I would have expected the result to be:
id1 id2 [1,] "a" "100" [2,] "a" "90" [3,] "a" "8"
Why the extra spaces?
They can create unexpected results when using apply on a data.frame:
myfunc <- function(row){ paste(row[1], row[2], sep = "") } > apply(df, 1, myfunc) [1] "a100" "a 90" "a 8" >
While looping gives the expected result.
> for (i in 1:nrow(df)){ print(myfunc(df[i,])) } [1] "a100" [1] "a90" [1] "a8"
and
> paste(df[,1], df[,2], sep = "") [1] "a100" "a90" "a8"
Are there any situations where the extra spaces that are added with as.matrix is useful?
This is because of the way non-numeric data are converted in the as.matrix.data.frame
method. There is a simple work-around, shown below.
?as.matrix
notes that conversion is done via format()
, and it is here that the additional spaces are added. Specifically, ?as.matrix
has this in the Details section:
‘as.matrix’ is a generic function. The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying ‘as.vector’ to factors and ‘format’ to other non-character columns. Otherwise, the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g., all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give a integer matrix, etc.
?format
also notes that
Character strings are padded with blanks to the display width of the widest.
Consider this example which illustrates the behaviour
> format(df[,2]) [1] "100" " 90" " 8" > nchar(format(df[,2])) [1] 3 3 3
format
doesn't have to work this way as it has trim
:
trim: logical; if ‘FALSE’, logical, numeric and complex values are right-justified to a common width: if ‘TRUE’ the leading blanks for justification are suppressed.
e.g.
> format(df[,2], trim = TRUE) [1] "100" "90" "8"
but there is no way to pass this argument along to the as.matrix.data.frame
method.
A way to work around this is to apply format()
yourself, manually, via sapply
. There you can pass in trim = TRUE
> sapply(df, format, trim = TRUE) id1 id2 [1,] "a" "100" [2,] "a" "90" [3,] "a" "8"
or, using vapply
we can state what we expect to be returned (here character vectors of length 3 [nrow(df)
]):
> vapply(df, format, FUN.VALUE = character(nrow(df)), trim = TRUE) id1 id2 [1,] "a" "100" [2,] "a" "90" [3,] "a" "8"
It does seem a little strange. In the manual (?as.matrix
) it explains that format
is called for the conversion to a character matrix:
The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns.
And you can see that if you call format
directly, it does what as.matrix
does:
format(df$id2) [1] "100" " 90" " 8"
What you need to do is pass the trim
arugment:
format(df$id2,trim=TRUE) [1] "100" "90" "8"
But, unfortunately, the as.matrix.data.frame
function doesn't allow you to do that.
else if (non.numeric) { for (j in pseq) { if (is.character(X[[j]])) next xj <- X[[j]] miss <- is.na(xj) xj <- if (length(levels(xj))) as.vector(xj) else format(xj) # This could have ... as an argument # else format(xj,...) is.na(xj) <- miss X[[j]] <- xj } }
So, you could modify as.data.frame.matrix
. I think it would be a nice feature addition, however, to include this in base.
But, a quick solution would be to simply:
as.matrix(data.frame(lapply(df,as.character))) id1 id2 [1,] "a" "100" [2,] "a" "90" [3,] "a" "8" # As mentioned in the comments, this also works: sapply(df,as.character)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With