My data currently looks like this, with the column "Number_Code based on each different Side_Effect:
Session_ID Side_Effect Number_Code
1 anxious 1
1 dizzy 2
1 relaxed 3
3 dizzy 2
7 nauseous 4
7 anxious 1
I know I can do:
mutate(rn = str_c('side_effect_', row_number())) %>%
pivot_wider(names_from = rn, values_from = Side_Effect)
In order to create new column names and put each side effect into a new column like this:
session Number_Code side_effect1 side effect_2 side_effect_3
1 1 anxious NA NA
1 2 NA dizzy NA
1 3 NA NA relaxed
3 2 dizzy NA NA
7 4 nauseous NA NA
7 1 NA anxious NA
But I need to widen the data based on both "Side_Effect" and "Number_Code", and have them in alternating columns like this:
session side_effect1 number_code1 side effect_2 number_code2 side_effect_3 number_code3
1 anxious 1 dizzy 2 relaxed 3
3 dizzy 2 NA NA NA NA
7 nauseous 4 anxious 1 NA NA
I saw another post where they widened the data based on two variables, but all of the columns for the second one were after all of the columns of the first one. Is there a way to get them to alternate like this? Thank you!!
pivot_wider() "widens" data, increasing the number of columns and decreasing the number of rows.
The pivot_wider() function from the tidyr package in R can be used to pivot a data frame from a long format to a wide format.
We cannot group columns by simply inserting a Pivot Table. For instance, in this method, I will use the PivotTable and PivotChart Wizard to create the Pivot table first and then group it into columns. Follow the below steps to create the expected Pivot Table.
We can specify both columns to pivot_wider () ’s values_from argument. pivot_wider () pivots variable, then creates columns by combining the value of measure with each column name specified in values_from.
That’s because both the pivot tables are sharing the same pivot cache. To understand better, when Excel creates a pivot table, it makes a copy of the entire source data, and creates a temporary pivot cache in the memory. This duplicated cache is now stored with the Excel file, doubling its size.
Pivot Table grouping is quite flexible. It allows you to group several different types of Fields. You can create many groups and you can group previously existing groups (create groups of groups). Despite its flexibility, Pivot Table grouping has some restrictions. Note the following 2 limitations: You can't add Calculated Items to grouped Fields.
The pivot_wider
can take multiple value_from
columns, so after creating the sequence by group, use pivot_wider
with values_from
specifying the columns of interest
library(dplyr)
library(tidyr)
df1 %>%
group_by(Session_ID) %>%
mutate(rn = row_number()) %>%
ungroup %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code))
# A tibble: 3 x 7
# Session_ID Side_Effect_1 Side_Effect_2 Side_Effect_3 Number_Code_1 Number_Code_2 Number_Code_3
# <int> <chr> <chr> <chr> <int> <int> <int>
#1 1 anxious dizzy relaxed 1 2 3
#2 3 dizzy <NA> <NA> 2 NA NA
#3 7 nauseous anxious <NA> 4 1 NA
If we need to reorder the column order, then we can select
based on the numeric part and order
df1 %>%
group_by(Session_ID) %>%
mutate(rn = row_number()) %>%
ungroup %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code)) %>%
select(Session_ID, names(.)[-1][order(readr::parse_number(names(.)[-1]))] )
# A tibble: 3 x 7
# Session_ID Side_Effect_1 Number_Code_1 Side_Effect_2 Number_Code_2 Side_Effect_3 Number_Code_3
# <int> <chr> <int> <chr> <int> <chr> <int>
#1 1 anxious 1 dizzy 2 relaxed 3
#2 3 dizzy 2 <NA> NA <NA> NA
#3 7 nauseous 4 anxious 1 <NA> NA
df1 <- structure(list(Session_ID = c(1L, 1L, 1L, 3L, 7L, 7L),
Side_Effect = c("anxious",
"dizzy", "relaxed", "dizzy", "nauseous", "anxious"), Number_Code = c(1L,
2L, 3L, 2L, 4L, 1L)), class = "data.frame", row.names = c(NA,
-6L))
I think this is best achieved via the pivot_*_spec()
interface which allows the building of a specification data frame. This data frame determines both the names and the variable order of the pivoted data.
library(tidyr)
library(dplyr)
df <- df %>%
group_by(Session_ID) %>%
mutate(row_id = factor(row_number(), labels = c("first", "next", "last")[1:max(row_number())])) %>%
ungroup()
spec <- df %>%
build_wider_spec(names_from = row_id, values_from = c(Side_Effect, Number_Code))
spec
# A tibble: 6 x 3
.name .value row_id
<chr> <chr> <fct>
1 Side_Effect_first Side_Effect first
2 Side_Effect_next Side_Effect next
3 Side_Effect_last Side_Effect last
4 Number_Code_first Number_Code first
5 Number_Code_next Number_Code next
6 Number_Code_last Number_Code last
Because the column order of the pivot is determined by the specification data row order, arrange()
can be used to flexibly control the final order of the pivot (where factors can be used, as in the data above, to fine tune the order of text variable names). Some examples:
# Alternating by row id
spec %>%
arrange(row_id) %>%
pivot_wider_spec(df, .)
# A tibble: 3 x 7
Session_ID Side_Effect_first Number_Code_first Side_Effect_next Number_Code_next Side_Effect_last Number_Code_last
<int> <chr> <int> <chr> <int> <chr> <int>
1 1 anxious 1 dizzy 2 relaxed 3
2 3 dizzy 2 NA NA NA NA
3 7 nauseous 4 anxious 1 NA NA
# Alternate by row_id and .value in ascending order
spec %>%
arrange(row_id, .value) %>%
pivot_wider_spec(df, .)
# A tibble: 3 x 7
Session_ID Number_Code_first Side_Effect_first Number_Code_next Side_Effect_next Number_Code_last Side_Effect_last
<int> <int> <chr> <int> <chr> <int> <chr>
1 1 1 anxious 2 dizzy 3 relaxed
2 3 2 dizzy NA NA NA NA
3 7 4 nauseous 1 anxious NA NA
# .value ascending row_id descending
spec %>%
arrange(.value, desc(row_id)) %>%
pivot_wider_spec(df, .)
# A tibble: 3 x 7
Session_ID Number_Code_last Number_Code_next Number_Code_first Side_Effect_last Side_Effect_next Side_Effect_first
<int> <int> <int> <int> <chr> <chr> <chr>
1 1 3 2 1 relaxed dizzy anxious
2 3 NA NA 2 NA NA dizzy
3 7 NA 1 4 NA anxious nauseous
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With