Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to use spread on multiple columns in tidyr similar to dcast? [duplicate]

Tags:

r

tidyr

reshape2

I have the following dummy data:

library(dplyr) library(tidyr) library(reshape2) dt <- expand.grid(Year = 1990:2014, Product=LETTERS[1:8], Country = paste0(LETTERS, "I")) %>%   select(Product, Country, Year) dt$value <- rnorm(nrow(dt)) 

I pick two product-country combinations

sdt <- dt %>% filter((Product == "A" & Country == "AI") | (Product == "B" & Country =="EI")) 

and I want to see the values side by side for each combination. I can do this with dcast:

sdt %>% dcast(Year ~ Product + Country) 

Is it possible to do this with spread from the package tidyr?

like image 307
mpiktas Avatar asked Jul 24 '14 09:07

mpiktas


People also ask

How do you use Tidyr spread?

To use spread() , pass it the name of a data frame, then the name of the key column in the data frame, and then the name of the value column. Pass the column names as they are; do not use quotes. To tidy table2 , you would pass spread() the key column and then the value column.

What is the opposite of gather in R?

The function spread() does the reverse of gather(). It takes two columns (key and value) and spreads into multiple columns.


2 Answers

One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste, remove those columns with the select and reshape from 'long' to 'wide' using spread from tidyr.

 library(dplyr)  library(tidyr)  sdt %>%  mutate(Prod_Count=paste(Product, Country, sep="_")) %>%  select(-Product, -Country)%>%   spread(Prod_Count, value)%>%  head(2)  #  Year      A_AI       B_EI  #1 1990 0.7878674  0.2486044  #2 1991 0.2343285 -1.1694878 

Or we can avoid a couple of steps by using unite from tidyr (from @beetroot's comment) and reshape as before.

 sdt%>%   unite(Prod_Count, Product,Country) %>%  spread(Prod_Count, value)%>%   head(2)  #   Year      A_AI       B_EI  # 1 1990 0.7878674  0.2486044  # 2 1991 0.2343285 -1.1694878 
like image 109
akrun Avatar answered Sep 17 '22 12:09

akrun


With the new function pivot_wider() introduced in tidyr version 1.0.0, this can be accomplished with one function call.

pivot_wider() (counterpart: pivot_longer()) works similar to spread(). However, it offers additional functionality such as using multiple key/name columns (and/or multiple value columns). To this end, the argument names_from—that indicates from which column(s) the names of the new variables are taken—may take more than one column name (here Product and Country).

library("tidyr")  sdt %>%      pivot_wider(id_cols = Year,                 names_from = c(Product, Country)) %>%      head(2) #> # A tibble: 2 x 3 #>     Year   A_AI    B_EI #>    <int>  <dbl>   <dbl> #>  1  1990 -2.08  -0.113  #>  2  1991 -1.02  -0.0546 

See also: https://tidyr.tidyverse.org/articles/pivot.html

like image 40
hplieninger Avatar answered Sep 18 '22 12:09

hplieninger