Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the spread function properly in tidyr

How do I change the following table from:

Type    Name    Answer     n
TypeA   Apple   Yes        5
TypeA   Apple   No        10
TypeA   Apple   DK         8
TypeA   Apple   NA        20
TypeA   Orange  Yes        6
TypeA   Orange  No        11
TypeA   Orange  DK         8
TypeA   Orange  NA        23

Change to:

Type    Name    Yes   No   DK   NA  
TypeA   Apple   5     10   8    20
TypeA   Orange  6     11   8    23

I used the following codes to get the first table.

df_1 <- 
  df %>% 
  group_by(Type, Name, Answer) %>% 
  tally()  

Then I tried to use the spread command to get to the 2nd table, but I got the following error message:

"Error: All columns must be named"

df_2 <- spread(df_1, Answer)
like image 378
ayk Avatar asked Jan 08 '16 19:01

ayk


People also ask

How do you use Tidyr spread?

The spread() function from the tidyr package can be used to “spread” a key-value pair across multiple columns. where: data: Name of the data frame. key: Column whose values will become variable names.

What does the spread () function from the R package Tidyr do?

spread() turns a pair of key:value columns into a set of tidy columns. To use spread() , pass it the name of a data frame, then the name of the key column in the data frame, and then the name of the value column.

What does spread function do in R?

Overview. The spread() function from the tidyr library can be helpful to spread a key-value pair across different columns. This function also helps reshape the data from long format to wide format. This function works exactly opposite of gather().

What function do we use to take multiple rows of data and condense them by adding more columns?

gather( ) function: To reformat the data such that these common attributes are gathered together as a single variable, the gather() function will take multiple columns and collapse them into key-value pairs, duplicating all other columns as needed.


2 Answers

Following on the comment from ayk, I'm providing an example. It looks to me like when you have a data_frame with a column of either a factor or character class that has values of NA, this cannot be spread without either removing them or re-classifying the data. This is specific to a data_frame (note the dplyr class with the underscore in the name), as this works in my example when you have values of NA in a data.frame. For example, a slightly modified version of the example above:

Here is the dataframe

library(dplyr)
library(tidyr)
df_1 <- data_frame(Type = c("TypeA", "TypeA", "TypeB", "TypeB"),
                   Answer = c("Yes", "No", NA, "No"),
                   n = 1:4)
df_1

Which gives a data_frame that looks like this

Source: local data frame [4 x 3]

   Type Answer     n
  (chr)  (chr) (int)
1 TypeA    Yes     1
2 TypeA     No     2
3 TypeB     NA     3
4 TypeB     No     4

Then, when we try to tidy it, we get an error message:

df_1 %>% spread(key=Answer, value=n)
Error: All columns must be named

But if we remove the NA's then it 'works':

df_1 %>%
    filter(!is.na(Answer)) %>%
    spread(key=Answer, value=n)
Source: local data frame [2 x 3]

   Type    No   Yes
  (chr) (int) (int)
1 TypeA     2     1
2 TypeB     4    NA

However, removing the NAs may not give you the desired result: i.e. you might want those to be included in your tidied table. You could modify the data directly to change the NAs to a more descriptive value. Alternatively, you could change your data to a data.frame and then it spreads just fine:

as.data.frame(df_1) %>% spread(key=Answer, value=n)
   Type No Yes NA
1 TypeA  2   1 NA
2 TypeB  4  NA  3
like image 65
Nicholas G Reich Avatar answered Sep 28 '22 08:09

Nicholas G Reich


I think only tidyr is needed to get from df_1 to df_2.

library(magrittr)
df_1 <- read.csv(text="Type,Name,Answer,n\nTypeA,Apple,Yes,5\nTypeA,Apple,No,10\nTypeA,Apple,DK,8\nTypeA,Apple,NA,20\nTypeA,Orange,Yes,6\nTypeA,Orange,No,11\nTypeA,Orange,DK,8\nTypeA,Orange,NA,23", stringsAsFactors=F)

df_2 <- df_1 %>% 
  tidyr::spread(key=Answer, value=n)

Output:

   Type   Name DK No Yes NA
1 TypeA  Apple  8 10   5 20
2 TypeA Orange  8 11   6 23
like image 25
wibeasley Avatar answered Sep 28 '22 07:09

wibeasley