I'm struggling with some data wrangling stuff which I feel should be easy to solve (with lapply or something) but I just can't get it to work (and I'm evidently rusty as hell with this stuff). I'm working with a web-dataset structured as follows:
df <- data.frame("ID" = c(1, 1, 1, 2, 3, 3),
"URL_visit" = c(1, 2, 3, 1, 1, 2), # e.g. customer ID #1 has visited 3 pages
"URL_name" = c("home", "login", "product_page", "home", "home", "product_page"),
"duration" = c(14, 40, 233, 8, 76, 561),
"home" = c(1, 0, 0, 1, 1, 0),
"login" = c(0, 1, 0, 0, 0, 0),
"product_page" = c(0, 0, 1, 0, 0, 1)
)
So basically a customer ID field, a number for each event per customer, the URL they visited in that event, followed by a column for each URL with an indicator (1/0) whether the customer visited that particular URL in that event.
My goal is to have a piece of code that replaces the 1/0 indicators with the duration if a match (1) is found, and retains 0 if not. In other words:
Either replacing all the current (0/1) values in the URL-columns with duration where applicable, or making a new set of columns (e.g., "home_duration") works for me.
A manual example solution is:
df %<>% dplyr::mutate(home_duration = if_else(home == 1, duration, 0))
But of course my aim is to automate this, and conduct it for the whole set of URL-columns (passing a vector with URL column names).
Help is much appreciated! Thanks! :)
You can try pivoting it to long, doing your transformation, and then pivoting it back wide again.
library(dplyr)
library(tidyr)
url_col_names <- c("home", "login", "product_page")
df %>%
pivot_longer(url_col_names, names_to = "url", values_to = "url_duration") %>%
mutate(url_duration = url_duration * duration) %>%
pivot_wider(names_from = "url", values_from = "url_duration")
# A tibble: 6 x 7
ID URL_visit URL_name duration home login product_page
<dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 1 1 home 14 14 0 0
2 1 2 login 40 0 40 0
3 1 3 product_page 233 0 0 233
4 2 1 home 8 8 0 0
5 3 1 home 76 76 0 0
6 3 2 product_page 561 0 0 561
Another way, probably more simple, is to do this.
df %>%
mutate(across(any_of(url_col_names), ~ . * duration))
ID URL_visit URL_name duration home login product_page
1 1 1 home 14 14 0 0
2 1 2 login 40 0 40 0
3 1 3 product_page 233 0 0 233
4 2 1 home 8 8 0 0
5 3 1 home 76 76 0 0
6 3 2 product_page 561 0 0 561
On another note, I imagine you created those indicator variables? If you are just hoping to replace them, then you actually might not need to create them to begin with. You can just pivot_wider()
from the start.
This would assume that your ID
and URL_visit
columns form a unique row combination.
df2 <- df[, 1:4]
df2 %>%
pivot_wider(names_from = "URL_name", values_from = "duration", values_fill = 0)
A simple multiplication should do the trick (this is equivalent to @Adam 's tidyverse solution above but in base R)
url_col_names <- c('home','login','product_page')
df$duration * df[,url_col_names] -> df[,url_col_names]
To rename the columns, you can do:
names(df)[names(df) %in% url_col_names] <- paste0(url_col_names, '_', 'duration')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With