R: harder pivot_longer arguments

Question

I have a dataframe with these names:


df <- tribble( ~ state, ~ county, ~ kfr_asian_pooled_p25, ~ kfr_asian_pooled_p75, 
               ~ kfr_black_pooled_p25, ~ kfr_black_pooled_p75, 
               ~ kfr_pooled_pooled_p25, ~ kfr_pooled_pooled_p75, ~ kfr_white_pooled_p25, 
               ~ kfr_white_pooled_p75, ~ kids_count, ~ kfr_asian_pooled_p25_se, 
               ~ kfr_asian_pooled_p75_se, ~ kfr_black_pooled_p25_se, 
               ~ kfr_black_pooled_p75_se, ~ kfr_white_pooled_p25_se, 
               ~ kfr_white_pooled_p75_se,
               1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)

Of course the dataset is giant, but what I want to achieve is to pivot longer using its maximum potential. I mean grabbing several variables at the same time and pivot them and their names.

df <- tribble(~ state, ~ county, ~ kids_count,   ~ race, ~ percentile,   ~ se, ~ value,
                     1,       2,            3,  "asian",        "p25",   TRUE,   value,
                     2,       3,            4,  "black",        "p25",   TRUE,   value,
                     3,       4,            5,  "white",        "p25",   TRUE,   value,
                     1,       2,            3,  "asian",        "p75",   TRUE,   value,
                     2,       3,            4,  "black",        "p75",   TRUE,   value,
                     3,       4,            5,  "white",        "p75",   TRUE,   value,
                     1,       2,            3,  "asian",        "p25",  FALSE,   value,
                     2,       3,            4,  "black",        "p25",  FALSE,   value,
                     3,       4,            5,  "white",        "p25",  FALSE,   value,
                     1,       2,            3,  "asian",        "p75",  FALSE,   value,
                     2,       3,            4,  "black",        "p75",  FALSE,   value,
                     3,       4,            5,  "white",        "p75",  FALSE,   value)

I would really appreciate any help!

langtang · Accepted Answer

You can do something like this, if I understand what you want

(First, a quick explanation of the pivot_longer() options that I use)

cols selects the columns you want to pivot
names_to with multiple elements allows us to pivot into multiple columns
names_prefix allows us to remove a common prefix across the columns that we want to pivot
name_sep allows us to indicate how the multiple names in names_to are broken up in the original columns

pivot_longer(
  df,
  cols = starts_with("kfr"),
  names_to=c("race", "pooled", "percentile", "se"), 
  names_prefix="kfr_", 
  names_sep="_"
) %>% 
  select(!pooled) %>% 
  mutate(se=!is.na(se))


# A tibble: 14 x 7
   state county kids_count race   percentile se    value
   <dbl>  <dbl>      <dbl> <chr>  <chr>      <lgl> <dbl>
 1     1      2         11 asian  p25        FALSE     3
 2     1      2         11 asian  p75        FALSE     4
 3     1      2         11 black  p25        FALSE     5
 4     1      2         11 black  p75        FALSE     6
 5     1      2         11 pooled p25        FALSE     7
 6     1      2         11 pooled p75        FALSE     8
 7     1      2         11 white  p25        FALSE     9
 8     1      2         11 white  p75        FALSE    10
 9     1      2         11 asian  p25        TRUE     12
10     1      2         11 asian  p75        TRUE     13
11     1      2         11 black  p25        TRUE     14
12     1      2         11 black  p75        TRUE     15
13     1      2         11 white  p25        TRUE     16
14     1      2         11 white  p75        TRUE     17

TrainingPizza · Answer

It looks like you are trying to pivot_longer just the percentile columns and then split the names to create separate columns if I'm understanding correctly. The tidyselect functions such as matches are very useful for selecting similarly named columns. I’ve also used str_split from the stringr package to help create the new character columns. You could always reorganize the columns with select or relocate if the order is important to you.

library(tidyverse)
library(stringr)

df_long <-
  df %>%
  pivot_longer(cols = matches("[p25]|[p75]"), names_to = "percentile") %>%
  mutate(se = grepl("se", percentile),
         race = stringr::str_split(percentile, "_", simplify = TRUE)[,2],
         percentile = stringr::str_split(percentile, "_", simplify = TRUE)[,4])

df_long
#> # A tibble: 14 x 7
#>    state county kids_count percentile value se    race  
#>    <dbl>  <dbl>      <dbl> <chr>      <dbl> <lgl> <chr> 
#>  1     1      2         11 p25            3 FALSE asian 
#>  2     1      2         11 p75            4 FALSE asian 
#>  3     1      2         11 p25            5 FALSE black 
#>  4     1      2         11 p75            6 FALSE black 
#>  5     1      2         11 p25            7 FALSE pooled
#>  6     1      2         11 p75            8 FALSE pooled
#>  7     1      2         11 p25            9 FALSE white 
#>  8     1      2         11 p75           10 FALSE white 
#>  9     1      2         11 p25           12 TRUE  asian 
#> 10     1      2         11 p75           13 TRUE  asian 
#> 11     1      2         11 p25           14 TRUE  black 
#> 12     1      2         11 p75           15 TRUE  black 
#> 13     1      2         11 p25           16 TRUE  white 
#> 14     1      2         11 p75           17 TRUE  white

^{Created on 2022-02-18 by the reprex package (v2.0.1)}

R: harder pivot_longer arguments

Tags:

r

pivot

tidyr

data-wrangling

Paula

2 Answers

langtang

TrainingPizza

Recent Activity

Donate For Us

R: harder pivot_longer arguments

Tags:

r

pivot

tidyr

data-wrangling

Paula

2 Answers

langtang

TrainingPizza

Related questions

Recent Activity

Donate For Us