I have been working with a data set something like this:
df <- tribble(
~id, ~price, ~day,
"1", 10, '3',
"1", 5, '1',
"2", 7, '4',
"2", 6, '2',
"2", 3, '4',
"3", 4, '1',
"4", 5, '1',
"4", 6, '1',
"5", 1, '2',
"5", 9, '3',
)
However, the real data has almost 50 unique values in the day. For the analysis, I would like to see the median prices for each day and for each id. Here is the desired data (the values are not correct):
df <- tribble(
~id, ~day_1, ~day_2, ~day_3, ~day_4,
"1", 1, 1, 1, 1,
"2", 1, 1, 1, 1,
"3", 1, 1, 1, 1,
"4", 1, 1, 1, 1,
"5", 1, 1, 1, 1,
)
To do this, I've tried to code it as I've shown below. But, I was not able to cut day variable since it is the string variable.
df %>%
mutate(date_day = cut(day)) %>%
select(-day) %>%
pivot_wider(names_from = date_day, values_from = median(price)) %>%
adorn_percentages()
What is the way to do it? Thanks!
In data.table, we can use dcast and specify the fun.aggregate to get the median value of 'price'
library(data.table)
dcast(setDT(df), id ~ paste0('day_', day), value.var = 'price', median)
# id day_1 day_2 day_3 day_4
#1: 1 5.0 NA 10 NA
#2: 2 NA 6 NA 5
#3: 3 4.0 NA NA NA
#4: 4 5.5 NA NA NA
#5: 5 NA 1 9 NA
With pivot_wider, there is values_fn which does the similar option in dcast so we can directly make use of it
library(tidyr)
library(stringr)
df %>%
pivot_wider(id_cols = id, names_from = day, values_from = price,
values_fn =list(price = median),
names_repair = ~ c('id', str_c('day', .[-1])))
# A tibble: 5 x 5
# id day3 day1 day4 day2
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 10 5 NA NA
#2 2 NA NA 5 6
#3 3 NA 4 NA NA
#4 4 NA 5.5 NA NA
#5 5 9 NA NA 1
With pivot functions, the columns are ordered as in the same order of occurence of values or else have to reorder before doing the pivoting
Or make use of rename_at after the pivot_wider
df %>%
pivot_wider(id_cols = id, names_from = day, values_from = price,
values_fn =list(price = median)) %>%
rename_at(-1, ~ str_c('day_', .))
# A tibble: 5 x 5
# id day_3 day_1 day_4 day_2
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 10 5 NA NA
#2 2 NA NA 5 6
#3 3 NA 4 NA NA
#4 4 NA 5.5 NA NA
#5 5 9 NA NA 1
I'm not sure I understand correctly your question. But based on your example data and the expected output, maybe you are looking for something like this ?
library(dplyr)
library(tidyr)
df %>% group_by(id, day) %>%
summarise(Med_price = median(price)) %>% arrange(day) %>%
mutate(day = paste("day_",day, sep = "")) %>%
pivot_wider(names_from = day, values_from = Med_price)
# A tibble: 5 x 5
# Groups: id [5]
id day_1 day_2 day_3 day_4
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 5 NA 10 NA
2 3 4 NA NA NA
3 4 5.5 NA NA NA
4 2 NA 6 NA 5
5 5 NA 1 9 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With