Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reshape2 melt warning message

Tags:

r

melt

reshape2

I'm using melt and encounter the following warning message:
attributes are not identical across measure variables; they will be dropped

After looking around people have mentioned it is because the variables are different classes; however, that is not the case with my dataset.

Here is the dataset:

test <- structure(list(park = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,  1L, 1L, 1L), .Label = c("miss", "piro", "sacn", "slbe"), class = "factor"),      a1.one = structure(c(3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L,      3L), .Label = c("agriculture", "beaver", "development", "flooding",      "forest_pathogen", "harvest_00_20", "harvest_30_60", "harvest_70_90",      "none"), class = "factor"), a2.one = structure(c(6L, 6L,      6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("development",      "forest_pathogen", "harvest_00_20", "harvest_30_60", "harvest_70_90",      "none"), class = "factor"), a3.one = structure(c(3L, 3L,      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("forest_pathogen",      "harvest_00_20", "none"), class = "factor"), a1.two = structure(c(3L,      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("agriculture",      "beaver", "development", "flooding", "forest_pathogen", "harvest_00_20",      "harvest_30_60", "harvest_70_90", "none"), class = "factor"),      a2.two = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,      6L), .Label = c("development", "forest_pathogen", "harvest_00_20",      "harvest_30_60", "harvest_70_90", "none"), class = "factor"),      a3.two = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,      3L), .Label = c("forest_pathogen", "harvest_00_20", "none"     ), class = "factor")), .Names = c("park", "a1.one", "a2.one",  "a3.one", "a1.two", "a2.two", "a3.two"), row.names = c(NA, 10L ), class = "data.frame") 

And here is the structure:

str(test) 'data.frame':   10 obs. of  7 variables:  $ park  : Factor w/ 4 levels "miss","piro",..: 1 1 1 1 1 1 1 1 1 1  $ a1.one: Factor w/ 9 levels "agriculture",..: 3 1 3 3 3 3 1 3 3 3  $ a2.one: Factor w/ 6 levels "development",..: 6 6 6 6 6 6 6 6 6 6  $ a3.one: Factor w/ 3 levels "forest_pathogen",..: 3 3 3 3 3 3 3 3 3 3  $ a1.two: Factor w/ 9 levels "agriculture",..: 3 3 3 3 3 3 3 3 3 3  $ a2.two: Factor w/ 6 levels "development",..: 6 6 6 6 6 6 6 6 6 6  $ a3.two: Factor w/ 3 levels "forest_pathogen",..: 3 3 3 3 3 3 3 3 3 3 

Is it because the number of levels are different for each variable? So, can I just ignore the warning message in this case?

To generate the warning message:

library(reshape2) test.m <- melt (test,id.vars=c('park')) Warning message: attributes are not identical across measure variables; they will be dropped 

Thanks.

like image 814
cherrytree Avatar asked Sep 05 '14 15:09

cherrytree


1 Answers

An explanation:

When you melt, you are combining multiple columns into one. In this case, you are combining factor columns, each of which has a levels attribute. These levels are not the same across columns because your factors are actually different. melt just coerces each factor to character and drops their attributes when creating the value column in the result.

In this case the warning doesn't matter, but you need to be very careful when combining columns that are not of the same "type", where "type" does not mean just vector type, but generically the nature of things it refers to. For example, I would not want to melt a column containing speeds in MPH with one containing weights in LBs.

One way to confirm that it is okay to combine your factor columns is to ask yourself whether any possible value in one column would be a reasonable value to have in every other column. If that is the case, then likely the correct thing to do would be to ensure that every factor column has all the possible levels that it could accept (in the same order). If you do this, you will not get a warning when you melt the table.

An illustration:

library(reshape2) DF <- data.frame(id=1:3, x=letters[1:3], y=rev(letters)[1:3]) str(DF) 

The levels for x and y are not the same:

'data.frame':  3 obs. of  3 variables: $ id: int  1 2 3 $ x : Factor w/ 3 levels "a","b","c": 1 2 3 $ y : Factor w/ 3 levels "x","y","z": 3 2 1 

Here we melt and look at the column x and y were molten into (value):

melt(DF, id.vars="id")$value 

We get a character vector and a warning:

[1] "a" "b" "c" "z" "y" "x" Warning message: attributes are not identical across measure variables; they will be dropped  

If however we reset the factors to have the same levels and only then melt:

DF[2:3] <- lapply(DF[2:3], factor, levels=letters) melt(DF, id.vars="id", factorsAsStrings=F)$value 

We get the correct factor and no warnings:

[1] a b c z y x Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z 

The default behavior of melt is to drop factor levels even when they are identical, which is why we use factorsAsStrings=F above. If you had not used that setting you would have gotten a character vector, but no warning. I would argue the default behavior should be to keep the result as a factor, but that is not the case here.

like image 193
BrodieG Avatar answered Sep 21 '22 10:09

BrodieG