Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how spread() in tidyr handles factor levels

Tags:

r

tidyr

spread

I was manipulating my data and found that I did something wrong at some point in the process. When I explored the issue, the problem came down to the following behavior of spread() in the tidyr package.

Here's a demonstrative example. Let us say we have a data frame like the following.

> d <- data.frame(factor1 = rep(LETTERS[1:3], each = 3),
+   factor2 = rep(paste0("level", c(1, 2, 10)), 3),
+   num = 1:9
+ )  
> d
  factor1 factor2 num
1       A  level1   1
2       A  level2   2
3       A level10   3
4       B  level1   4
5       B  level2   5
6       B level10   6
7       C  level1   7
8       C  level2   8
9       C level10   9

What I wanted to do was to convert this long-formatted data frame into wide format. And I thought spread() is a way to go. The result, however, was not what I expected.

> spread(d, factor2, num)
  factor1 level1 level2 level10
1       A      1      3       2
2       B      4      6       5
3       C      7      9       8

If factor1 is "A" and factor2 is "level2", the value should be 2, but the resulting wide format says 3. Apparently, the num is ordered by the alphabetical order of factor2 (level1 > level10 > level2) and is placed into the wide format. But when it is, the factor2 labels retains the same order as they appear in the original data frame (level1 > level2 > level10).

Could anyone explain why this happens (and/or where I can find relevant information)?

like image 956
Akira Murakami Avatar asked Oct 06 '14 17:10

Akira Murakami


1 Answers

Using the data provided, I got different result:

> packageVersion("tidyr")
[1] ‘0.1’
spread(d, factor2, num)
  factor1 level1 level10 level2
1       A      1       3      2
2       B      4       6      5
3       C      7       9      8
like image 168
KFB Avatar answered Nov 06 '22 17:11

KFB