Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use cut function for the string variables?

Tags:

r

dplyr

I have been working with a data set something like this:

df <- tribble(
  ~id,  ~price,  ~day,        
  "1",    10,     '3',        
  "1",     5,     '1',         
  "2",     7,     '4',
  "2",     6,     '2', 
  "2",     3,     '4',
  "3",     4,     '1',
  "4",     5,     '1',
  "4",     6,     '1',
  "5",     1,     '2',
  "5",     9,     '3',
)

However, the real data has almost 50 unique values in the day. For the analysis, I would like to see the median prices for each day and for each id. Here is the desired data (the values are not correct):

df <- tribble(
  ~id,  ~day_1,   ~day_2,  ~day_3,  ~day_4,    
  "1",    1,          1,      1,       1, 
  "2",    1,          1,      1,       1,      
  "3",    1,          1,      1,       1,
  "4",    1,          1,      1,       1,
  "5",    1,          1,      1,       1,
)

To do this, I've tried to code it as I've shown below. But, I was not able to cut day variable since it is the string variable.

df %>% 
  mutate(date_day = cut(day)) %>% 
  select(-day) %>% 
  pivot_wider(names_from = date_day, values_from = median(price)) %>% 
  adorn_percentages()

What is the way to do it? Thanks!

like image 218
datazang Avatar asked Nov 18 '25 08:11

datazang


2 Answers

In data.table, we can use dcast and specify the fun.aggregate to get the median value of 'price'

library(data.table)
dcast(setDT(df), id ~ paste0('day_', day), value.var = 'price', median)
#    id day_1 day_2 day_3 day_4
#1:  1   5.0    NA    10    NA
#2:  2    NA     6    NA     5
#3:  3   4.0    NA    NA    NA
#4:  4   5.5    NA    NA    NA
#5:  5    NA     1     9    NA

With pivot_wider, there is values_fn which does the similar option in dcast so we can directly make use of it

library(tidyr)
library(stringr)
df %>%
   pivot_wider(id_cols = id, names_from = day, values_from = price, 
   values_fn =list(price = median),
     names_repair = ~ c('id', str_c('day', .[-1])))
# A tibble: 5 x 5
#  id     day3  day1  day4  day2
#  <chr> <dbl> <dbl> <dbl> <dbl>
#1 1        10   5      NA    NA
#2 2        NA  NA       5     6
#3 3        NA   4      NA    NA
#4 4        NA   5.5    NA    NA
#5 5         9  NA      NA     1

With pivot functions, the columns are ordered as in the same order of occurence of values or else have to reorder before doing the pivoting

Or make use of rename_at after the pivot_wider

df %>%
   pivot_wider(id_cols = id, names_from = day, values_from = price, 
        values_fn =list(price = median)) %>%
  rename_at(-1, ~ str_c('day_', .))
# A tibble: 5 x 5
#  id    day_3 day_1 day_4 day_2
#  <chr> <dbl> <dbl> <dbl> <dbl>
#1 1        10   5      NA    NA
#2 2        NA  NA       5     6
#3 3        NA   4      NA    NA
#4 4        NA   5.5    NA    NA
#5 5         9  NA      NA     1
like image 91
akrun Avatar answered Nov 19 '25 23:11

akrun


I'm not sure I understand correctly your question. But based on your example data and the expected output, maybe you are looking for something like this ?

library(dplyr)
library(tidyr)

df %>% group_by(id, day) %>%
        summarise(Med_price = median(price)) %>% arrange(day) %>%
        mutate(day = paste("day_",day, sep = "")) %>% 
        pivot_wider(names_from = day, values_from = Med_price)

# A tibble: 5 x 5
# Groups:   id [5]
  id    day_1 day_2 day_3 day_4
  <chr> <dbl> <dbl> <dbl> <dbl>
1 1       5      NA    10    NA
2 3       4      NA    NA    NA
3 4       5.5    NA    NA    NA
4 2      NA       6    NA     5
5 5      NA       1     9    NA
like image 39
dc37 Avatar answered Nov 20 '25 00:11

dc37