I have a dataframe with these names:
df <- tribble( ~ state, ~ county, ~ kfr_asian_pooled_p25, ~ kfr_asian_pooled_p75,
~ kfr_black_pooled_p25, ~ kfr_black_pooled_p75,
~ kfr_pooled_pooled_p25, ~ kfr_pooled_pooled_p75, ~ kfr_white_pooled_p25,
~ kfr_white_pooled_p75, ~ kids_count, ~ kfr_asian_pooled_p25_se,
~ kfr_asian_pooled_p75_se, ~ kfr_black_pooled_p25_se,
~ kfr_black_pooled_p75_se, ~ kfr_white_pooled_p25_se,
~ kfr_white_pooled_p75_se,
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
Of course the dataset is giant, but what I want to achieve is to pivot longer using its maximum potential. I mean grabbing several variables at the same time and pivot them and their names.
df <- tribble(~ state, ~ county, ~ kids_count, ~ race, ~ percentile, ~ se, ~ value,
1, 2, 3, "asian", "p25", TRUE, value,
2, 3, 4, "black", "p25", TRUE, value,
3, 4, 5, "white", "p25", TRUE, value,
1, 2, 3, "asian", "p75", TRUE, value,
2, 3, 4, "black", "p75", TRUE, value,
3, 4, 5, "white", "p75", TRUE, value,
1, 2, 3, "asian", "p25", FALSE, value,
2, 3, 4, "black", "p25", FALSE, value,
3, 4, 5, "white", "p25", FALSE, value,
1, 2, 3, "asian", "p75", FALSE, value,
2, 3, 4, "black", "p75", FALSE, value,
3, 4, 5, "white", "p75", FALSE, value)
I would really appreciate any help!
You can do something like this, if I understand what you want
(First, a quick explanation of the pivot_longer() options that I use)
cols selects the columns you want to pivotnames_to with multiple elements allows us to pivot into multiple columnsnames_prefix allows us to remove a common prefix across the columns that we want to pivotname_sep allows us to indicate how the multiple names in names_to are broken up in the original columnspivot_longer(
df,
cols = starts_with("kfr"),
names_to=c("race", "pooled", "percentile", "se"),
names_prefix="kfr_",
names_sep="_"
) %>%
select(!pooled) %>%
mutate(se=!is.na(se))
# A tibble: 14 x 7
state county kids_count race percentile se value
<dbl> <dbl> <dbl> <chr> <chr> <lgl> <dbl>
1 1 2 11 asian p25 FALSE 3
2 1 2 11 asian p75 FALSE 4
3 1 2 11 black p25 FALSE 5
4 1 2 11 black p75 FALSE 6
5 1 2 11 pooled p25 FALSE 7
6 1 2 11 pooled p75 FALSE 8
7 1 2 11 white p25 FALSE 9
8 1 2 11 white p75 FALSE 10
9 1 2 11 asian p25 TRUE 12
10 1 2 11 asian p75 TRUE 13
11 1 2 11 black p25 TRUE 14
12 1 2 11 black p75 TRUE 15
13 1 2 11 white p25 TRUE 16
14 1 2 11 white p75 TRUE 17
It looks like you are trying to pivot_longer just the percentile columns and then split the names to create separate columns if I'm understanding correctly. The tidyselect functions such as matches are very useful for selecting similarly named columns. I’ve also used str_split from the stringr package to help create the new character columns. You could always reorganize the columns with select or relocate if the order is important to you.
library(tidyverse)
library(stringr)
df_long <-
df %>%
pivot_longer(cols = matches("[p25]|[p75]"), names_to = "percentile") %>%
mutate(se = grepl("se", percentile),
race = stringr::str_split(percentile, "_", simplify = TRUE)[,2],
percentile = stringr::str_split(percentile, "_", simplify = TRUE)[,4])
df_long
#> # A tibble: 14 x 7
#> state county kids_count percentile value se race
#> <dbl> <dbl> <dbl> <chr> <dbl> <lgl> <chr>
#> 1 1 2 11 p25 3 FALSE asian
#> 2 1 2 11 p75 4 FALSE asian
#> 3 1 2 11 p25 5 FALSE black
#> 4 1 2 11 p75 6 FALSE black
#> 5 1 2 11 p25 7 FALSE pooled
#> 6 1 2 11 p75 8 FALSE pooled
#> 7 1 2 11 p25 9 FALSE white
#> 8 1 2 11 p75 10 FALSE white
#> 9 1 2 11 p25 12 TRUE asian
#> 10 1 2 11 p75 13 TRUE asian
#> 11 1 2 11 p25 14 TRUE black
#> 12 1 2 11 p75 15 TRUE black
#> 13 1 2 11 p25 16 TRUE white
#> 14 1 2 11 p75 17 TRUE white
Created on 2022-02-18 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With