Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a data frame to a matrix with plyr daply

Tags:

dataframe

r

plyr

I'm trying to use the daply function in the plyr package but I cannot get it to output properly. Even though the variable that makes up the matrix is numeric, the elements of the matrix are lists, not the variable itself. Here is a small subset of the data for example sake:

   Month Vehicle Samples
1 Oct-10   31057     256
2 Oct-10   31059     316
3 Oct-10   31060     348
4 Nov-10   31057     267
5 Nov-10   31059     293
6 Nov-10   31060     250
7 Dec-10   31057     159
8 Dec-10   31059     268
9 Dec-10   31060     206

And I would like to be able to visualize the data in a matrix format, which would look something like this:

  Month
Vehicle Oct-10 Nov-10 Dec-10
  31057    256    267    159
  31059    316    293    268
  31060    348    250    206

Here are a couple of alternative syntax that I use (the latter because my original dataframe has more columns than I show here):

daply(DF, .(Vehicle, Month), identity)
daply(DF,.(Vehicle,Month), colwise(identity,.(Samples)))

However what I get instead is rather abstruse:

       Month
Vehicle Oct-10 Nov-10 Dec-10
  31057 List,3 List,3 List,3
  31059 List,3 List,3 List,3
  31060 List,3 List,3 List,3

I used the str function on the output as some commenters have suggested, and here is an excerpt:

List of 9
 $ :'data.frame':       1 obs. of  3 variables:
  ..$ Month  : Ord.factor w/ 3 levels "Oct-10"<"Nov-10"<..: 1
  ..$ Vehicle: Factor w/ 3 levels "31057","31059",..: 1
  ..$ Samples: int 256
 $ :'data.frame':       1 obs. of  3 variables:
  ..$ Month  : Ord.factor w/ 3 levels "Oct-10"<"Nov-10"<..: 1
  ..$ Vehicle: Factor w/ 3 levels "31057","31059",..: 2
  ..$ Samples: int 316

What am I missing? Also, is there a way to do this simply with the base packages? Thanks!

Below is the Dput of the data frame if you'd like to reproduce this:

structure(list(Month = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("Oct-10", "Nov-10", "Dec-10"), class = c("ordered", 
"factor")), Vehicle = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), .Label = c("31057", "31059", "31060"), class = "factor"), 
    Samples = c(256L, 316L, 348L, 267L, 293L, 250L, 159L, 268L, 
    206L)), .Names = c("Month", "Vehicle", "Samples"), class = "data.frame", row.names = c(NA, 
9L))
like image 776
JD Margulici Avatar asked Aug 10 '11 04:08

JD Margulici


People also ask

How do you change a Dataframe to a matrix?

Convert a Data Frame into a Numeric Matrix in R Programming – data. matrix() Function. data. matrix() function in R Language is used to create a matrix by converting all the values of a Data Frame into numeric mode and then binding them as a matrix.

Is PLYR deprecated?

plyr-deprecated: Deprecated Functions in Package plyr in plyr: Tools for Splitting, Applying and Combining Data.

How do I convert a dataset to numeric in R?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

What is PLYR package in R?

plyr is an R package that makes it simple to split data apart, do stuff to it, and mash it back together. This is a common data-manipulation step. Importantly, plyr makes it easy to control the input and output data format from a syntactically consistent set of functions.


2 Answers

The identity function isn't what you want here; from the help page, "All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure." The simpler pieces in this case are subsets of the original data frame with unique Vehicle/Month combinations; the identity function just returns that subset, and these subsets are then used to fill the resulting matrix.

That is, each element of the matrix you got is a data frame (which is a type of list) with the rows with that Month/Vehicle combination.

> try1 <- daply(DF, .(Vehicle, Month), identity)
> try1[1,1]
[[1]]
   Month Vehicle Samples
1 Oct-10   31057     256

You instead want to use a function that just gets the Samples portion of that data frame, like this:

daply(DF, .(Vehicle, Month), function(x) x$Samples)

which results in

       Month
Vehicle Oct-10 Nov-10 Dec-10
  31057    256    267    159
  31059    316    293    268
  31060    348    250    206

A few alternate ways of doing this are with cast from the reshape package (which returns a data frame)

cast(DF, Vehicle~Month, value="Samples")

the revised version in reshape2; the first returns a data frame, the second a matrix

dcast(DF, Vehicle~Month, value_var="Samples")
acast(DF, Vehicle~Month, value_var="Samples")

with xtabs from the stats package

xtabs(Samples ~ Vehicle + Month, DF)

or by hand, which isn't hard at all using matrix indexing; almost all the code is just setting up the matrix.

with(DF, {
  out <- matrix(nrow=nlevels(Vehicle), ncol=nlevels(Month),
                dimnames=list(Vehicle=levels(Vehicle), Month=levels(Month)))
  out[cbind(Vehicle, Month)] <- Samples
  out
})

The reshape function in the stats package can also be used to do this, but the syntax is difficult and I haven't used it once since learning cast and melt from the reshape package.

like image 194
Aaron left Stack Overflow Avatar answered Nov 03 '22 01:11

Aaron left Stack Overflow


If we take the OP at their word(s) in the title, then they may be looking for data.matrix() which is a standard function in the base package that is always available in R.

data.matrix() works by converting any factors to their numeric coding before converting the data frame to a matrix. Consider the following data frame:

dat <- data.frame(A = 1:10, B = factor(sample(c("X","Y"), 10, replace = TRUE)))

If we convert via as.matrix() we get a character matrix:

> head(as.matrix(dat))
     A    B  
[1,] " 1" "X"
[2,] " 2" "X"
[3,] " 3" "Y"
[4,] " 4" "Y"
[5,] " 5" "Y"
[6,] " 6" "Y"

or if via matrix() one gets a list with dimensions (a list array - as mentioned in the Value section of ?daply by the way)

> head(matrix(dat))
     [,1]      
[1,] Integer,10
[2,] factor,10 
> str(matrix(dat))
List of 2
 $ : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ : Factor w/ 2 levels "X","Y": 1 1 2 2 2 2 1 2 2 1
 - attr(*, "dim")= int [1:2] 2 1

data.matrix(), however, does the intended thing:

> mat <- data.matrix(dat)
> head(mat)
     A B
[1,] 1 1
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 2
[6,] 6 2
> str(mat)
 int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "A" "B"
like image 34
Gavin Simpson Avatar answered Nov 03 '22 01:11

Gavin Simpson