How do I change the following table from:
Type Name Answer n
TypeA Apple Yes 5
TypeA Apple No 10
TypeA Apple DK 8
TypeA Apple NA 20
TypeA Orange Yes 6
TypeA Orange No 11
TypeA Orange DK 8
TypeA Orange NA 23
Change to:
Type Name Yes No DK NA
TypeA Apple 5 10 8 20
TypeA Orange 6 11 8 23
I used the following codes to get the first table.
df_1 <-
df %>%
group_by(Type, Name, Answer) %>%
tally()
Then I tried to use the spread command to get to the 2nd table, but I got the following error message:
"Error: All columns must be named"
df_2 <- spread(df_1, Answer)
The spread() function from the tidyr package can be used to “spread” a key-value pair across multiple columns. where: data: Name of the data frame. key: Column whose values will become variable names.
spread() turns a pair of key:value columns into a set of tidy columns. To use spread() , pass it the name of a data frame, then the name of the key column in the data frame, and then the name of the value column.
Overview. The spread() function from the tidyr library can be helpful to spread a key-value pair across different columns. This function also helps reshape the data from long format to wide format. This function works exactly opposite of gather().
gather( ) function: To reformat the data such that these common attributes are gathered together as a single variable, the gather() function will take multiple columns and collapse them into key-value pairs, duplicating all other columns as needed.
Following on the comment from ayk, I'm providing an example. It looks to me like when you have a data_frame with a column of either a factor or character class that has values of NA, this cannot be spread without either removing them or re-classifying the data. This is specific to a data_frame (note the dplyr class with the underscore in the name), as this works in my example when you have values of NA in a data.frame. For example, a slightly modified version of the example above:
Here is the dataframe
library(dplyr)
library(tidyr)
df_1 <- data_frame(Type = c("TypeA", "TypeA", "TypeB", "TypeB"),
Answer = c("Yes", "No", NA, "No"),
n = 1:4)
df_1
Which gives a data_frame that looks like this
Source: local data frame [4 x 3]
Type Answer n
(chr) (chr) (int)
1 TypeA Yes 1
2 TypeA No 2
3 TypeB NA 3
4 TypeB No 4
Then, when we try to tidy it, we get an error message:
df_1 %>% spread(key=Answer, value=n)
Error: All columns must be named
But if we remove the NA's then it 'works':
df_1 %>%
filter(!is.na(Answer)) %>%
spread(key=Answer, value=n)
Source: local data frame [2 x 3]
Type No Yes
(chr) (int) (int)
1 TypeA 2 1
2 TypeB 4 NA
However, removing the NAs may not give you the desired result: i.e. you might want those to be included in your tidied table. You could modify the data directly to change the NAs to a more descriptive value. Alternatively, you could change your data to a data.frame and then it spreads just fine:
as.data.frame(df_1) %>% spread(key=Answer, value=n)
Type No Yes NA
1 TypeA 2 1 NA
2 TypeB 4 NA 3
I think only tidyr is needed to get from df_1
to df_2
.
library(magrittr)
df_1 <- read.csv(text="Type,Name,Answer,n\nTypeA,Apple,Yes,5\nTypeA,Apple,No,10\nTypeA,Apple,DK,8\nTypeA,Apple,NA,20\nTypeA,Orange,Yes,6\nTypeA,Orange,No,11\nTypeA,Orange,DK,8\nTypeA,Orange,NA,23", stringsAsFactors=F)
df_2 <- df_1 %>%
tidyr::spread(key=Answer, value=n)
Output:
Type Name DK No Yes NA
1 TypeA Apple 8 10 5 20
2 TypeA Orange 8 11 6 23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With