I have a data set as I've shown below:
df <- tribble(
~id, ~price, ~number_of_book,
"1", 10, 3,
"1", 5, 1,
"2", 7, 4,
"2", 6, 2,
"2", 3, 4,
"3", 4, 1,
"4", 5, 1,
"4", 6, 1,
"5", 1, 2,
"5", 9, 3,
)
As you see in the data set, there are 3 books which cost 10 dollar for each book if id is "1" and 1 book that costs 5 dollar. Basically, I want to see the share (%) the number of books for each price bin. Here is my desired data set:
df <- tribble(
~id, ~less_than_three, ~three-five, ~five-six, ~more_than_six,
"1", "0%", "25%", "0%", "75%",
"2", "0%", "40%", "20%", "40%",
"3", "0%", "100%", "0%", "0%",
"4", "0%", "50%", "50%", "0%",
"5", "40%", "0%", "0%", "60%",
)
Now, I clustered the prices first. To do this, I run the below code:
out <- cut(df$price, breaks = c(0, 3, 5, 6, 10),
labels = c("<3","3-5","5-6", ">6"))
out = table(out) / sum(table(out))
But unfortunately, I could not go further because of lack of coding knowledge. Would you help me to get the desired data?
We can use cut to get the intervals and then using tidyr transform data to wide format and at the end using janitor add the percentages.
library(dplyr)
library(tidyr)
library(janitor)
df %>%
mutate(interval = cut(price, c(0,3,5,6,Inf))) %>%
select(-price) %>%
pivot_wider(names_from = interval, values_from = number_of_book) %>%
adorn_percentages()
#> id (6,Inf] (3,5] (5,6] (0,3]
#> 1 0.75 0.25 NA NA
#> 2 0.40 NA 0.2 0.4
#> 3 NA 1.00 NA NA
#> 4 NA 0.50 0.5 NA
#> 5 0.60 NA NA 0.4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With