Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting matrix to dataframe : Works in one case, not another

Below is a snipped of the output of a sample session. In it, I create a matrix with the matrix() function and simply convert it to a dataframe with the as.data.frame() function. In the second section, I also create a matrix, but through a different process (the one I want to make work), but even though str() gives me analogous output, I get an error when converting to a dataframe. Any ideas?

EDIT: At the end, I added a line where I (re)cast the matrix to a matrix, and then convert that to a data frame. It works, but I shouldn't have to recast according to what I see in the str() output of the test_mx that fails to be cast as a data frame. So I know how to fix, but I don't understand why I need to do that extra step to do so.

R version 2.15.2 (2012-10-26) -- "Trick or Treat"

> library(reshape)
> ## This works
> ## ==========
> tmx = matrix(1:12*0.1, ncol=4)
> rownames(tmx) = c("A", "B", "C")
> colnames(tmx) = 0:3
> tmx
    0   1   2   3
A 0.1 0.4 0.7 1.0
B 0.2 0.5 0.8 1.1
C 0.3 0.6 0.9 1.2
> 
> str(tmx)
 num [1:3, 1:4] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:3] "A" "B" "C"
  ..$ : chr [1:4] "0" "1" "2" "3"
> as.data.frame(tmx)
    0   1   2   3
A 0.1 0.4 0.7 1.0
B 0.2 0.5 0.8 1.1
C 0.3 0.6 0.9 1.2
> 
> 
> 
> ## This does not
> ## =============
> t = 0:3
> thesd = 0.1
> dat = data.frame(
+     a1 = sin(2*pi*t/length(t)) + rnorm(t, sd=thesd),
+     b1 = sin(2*pi*t/length(t) - pi) + rnorm(t, sd=thesd),
+     c1 = sin(2*pi*t/length(t) - pi/2) + rnorm(t, sd=thesd),
+     t  = t
+ )
> 
> test_mx = cast(melt(dat, id.vars="t"), variable ~ t)
> tmp_rownames = as.character(test_mx[,1])
> test_mx = test_mx[,-1]
> tmp_colnames = colnames(test_mx)
> test_mx = as.matrix(test_mx)
> rownames(test_mx) = tmp_rownames
> colnames(test_mx) = tmp_colnames
> 
> str(test_mx)
 num [1:3, 1:4] 0.06211 -0.00596 -1.09718 1.1555 -0.96443 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:3] "a1" "b1" "c1"
  ..$ : chr [1:4] "0" "1" "2" "3"
> as.data.frame(test_mx)
Error in data.frame(rrownames(x), unx, check.names = FALSE) : 
  arguments imply differing number of rows: 0, 3
> 
> ## But this does work
> as.data.frame(as.matrix(test_mx))
             0           1           2           3
a1 -0.16166693  0.97479282  0.01471777 -1.01517539
b1 -0.01012797 -0.97745698 -0.12667287  0.96542412
c1 -1.07217297  0.06783235  1.12068282 -0.02012263

> ## why?
like image 215
mpettis Avatar asked Jul 25 '13 20:07

mpettis


2 Answers

While @agstudy's answer solves your problem and gets you up to date with the most recent packages, it doesn't go into trying to understand why this happens.

To understand why, step back to your line test_mx = cast(melt(dat, id.vars="t"), variable ~ t). I'll create two objects here so we can do some comparisons:

test_mx <- test_mx_cast <- cast(melt(dat, id.vars="t"), variable ~ t)
class(test_mx)
# [1] "cast_df"    "data.frame"
class(test_mx_cast)
# [1] "cast_df"    "data.frame"

Hmm. What is this cast_df class? It turns out that the "reshape" method has gone and defined several new methods. See, for example, methods(as.data.frame) or methods(as.matrix):

> methods(as.matrix)
[1] as.matrix.cast_df     as.matrix.cast_matrix as.matrix.data.frame  as.matrix.default    
[5] as.matrix.dist*       as.matrix.noquote     as.matrix.POSIXlt     as.matrix.raster*    

   Non-visible functions are asterisked
> methods(as.data.frame)
 [1] as.data.frame.aovproj*        as.data.frame.array           as.data.frame.AsIs           
 [4] as.data.frame.cast_df         as.data.frame.cast_matrix     as.data.frame.character      
 [7] as.data.frame.complex         as.data.frame.data.frame      as.data.frame.Date           
[10] as.data.frame.default         as.data.frame.difftime        as.data.frame.factor         
[13] as.data.frame.ftable*         as.data.frame.function*       as.data.frame.idf*           
[16] as.data.frame.integer         as.data.frame.list            as.data.frame.logical        
[19] as.data.frame.logLik*         as.data.frame.matrix          as.data.frame.model.matrix   
[22] as.data.frame.numeric         as.data.frame.numeric_version as.data.frame.ordered        
[25] as.data.frame.POSIXct         as.data.frame.POSIXlt         as.data.frame.raw            
[28] as.data.frame.table           as.data.frame.ts              as.data.frame.vector         

   Non-visible functions are asterisked

Notice above ^^ the first and second methods for as.matrix and fourth and fifth methods for as.data.frame.

What does this mean? Well, you went and wrote several lines after you created test_mx to convert your data.frame to a matrix. This was mostly because you wanted to make sure that your first column ended up as rownames and didn't coerce your entire matrix to a character matrix.

tmp_rownames = as.character(test_mx[,1])
test_mx = test_mx[,-1]
tmp_colnames = colnames(test_mx)
test_mx = as.matrix(test_mx)
rownames(test_mx) = tmp_rownames
colnames(test_mx) = tmp_colnames
test_mx
#               0           1            2          3
# a1 -0.079811371  0.82820704 -0.193860367 -1.1269632
# b1 -0.009402418 -1.19348155 -0.004519269  0.8921427
# c1 -0.784163111 -0.01340952  0.966208235  0.0135557

Because "reshape" has already defined a customized as.matrix method, you didn't actually need to do that!

as.matrix(test_mx_cast)
#               0           1            2          3
# a1 -0.079811371  0.82820704 -0.193860367 -1.1269632
# b1 -0.009402418 -1.19348155 -0.004519269  0.8921427
# c1 -0.784163111 -0.01340952  0.966208235  0.0135557

But that still doesn't exactly answer everything. To understand further, compare the two matrices now:

> test_mx_cast_matrix <- as.matrix(test_mx_cast)
> class(test_mx)
[1] "cast_matrix" "matrix"     
> class(test_mx_cast_matrix)
[1] "cast_matrix" "matrix"     
> str(test_mx)
 num [1:3, 1:4] -0.0798 -0.0094 -0.7842 0.8282 -1.1935 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:3] "a1" "b1" "c1"
  ..$ : chr [1:4] "0" "1" "2" "3"
> str(test_mx_cast_matrix)
 num [1:3, 1:4] -0.0798 -0.0094 -0.7842 0.8282 -1.1935 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:3] "a1" "b1" "c1"
  ..$ : chr [1:4] "0" "1" "2" "3"
 - attr(*, "idvars")= chr "variable"
 - attr(*, "rdimnames")=List of 2
  ..$ :'data.frame':    3 obs. of  1 variable:
  .. ..$ variable: Factor w/ 3 levels "a1","b1","c1": 1 2 3
  ..$ :'data.frame':    4 obs. of  1 variable:
  .. ..$ t: int [1:4] 0 1 2 3

Hmmm. When we use as.matrix directly, all of the attributes that the "reshape" package adds are retained, but when we do the process manually, it still claims to be the same class, but all the custom attributes have been stripped.

So what?

Well, since R thinks that test_mx is a cast_matrix, when you call as.data.frame, it actually calls as.data.frame.cast_matrix, not as.data.frame.matrix.

Looking at how as.data.frame.cast_matrix is defined, those attributes are essential to recreate your data.frame, hence your errors. Here are the guts of the function:

> as.data.frame.cast_matrix
function (x, row.names, optional, ...) 
{
    unx <- unclass(x)
    colnames(unx) <- rownames(rcolnames(x))
    r.df <- data.frame(rrownames(x), unx, check.names = FALSE)
    class(r.df) <- c("cast_df", "data.frame")
    attr(r.df, "idvars") <- attr(x, "idvars")
    attr(r.df, "rdimnames") <- attr(x, "rdimnames")
    rownames(r.df) <- 1:nrow(r.df)
    r.df
}
<environment: namespace:reshape>

So, you now have three options:

  1. Upgrade to "reshape2" -- Good advice, but keep in mind that there are still a good number of people who haven't bothered to make the switch.

  2. Use "reshape" correctly, which requires looking a bit more at the str, classes and attributes of the objects it creates. Using it "correctly" here would have been to use as.data.frame(test_mx_cast_matrix).

  3. Specify the method you want to use (which is pretty safe when you don't know if packages are redefining methods--often, when they create new classes, you should also check to see if new methods have been created). Compare:

    > as.data.frame(test_mx)        ## Calls `as.data.frame.cast_matrix` ERROR!
    Error in data.frame(rrownames(x), unx, check.names = FALSE) : 
      arguments imply differing number of rows: 0, 3
    > as.data.frame.matrix(test_mx) ## Specifies the `as.data.frame` method. WORKS!
                  0           1            2          3
    a1 -0.079811371  0.82820704 -0.193860367 -1.1269632
    b1 -0.009402418 -1.19348155 -0.004519269  0.8921427
    c1 -0.784163111 -0.01340952  0.966208235  0.0135557
    

Sigh. The end....

like image 99
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 02 '22 19:10

A5C1D2H2I1M1N2O1R2T1


You should use reshape2, not reshape, as the latter is obsolete.

Change also cast by dcast or acast.

as.data.frame(test_mx)
            0           1            2           3
1 -0.08120468  0.97593052 -0.006127179 -1.15107784
2 -0.04165681 -1.02810193  0.004637454  0.99042403
3 -0.87862063  0.07346341  1.019113669 -0.01769976
like image 34
agstudy Avatar answered Oct 02 '22 17:10

agstudy