Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge rows together by condition in R

Tags:

merge

r

I have a data.frame that looks like this:

IsLead    ID Path    LogTime                    PathCode    Conversion
0         198822     2015-06-19 01:57:11.000    J           ConvA
0         198822     2015-06-19 01:58:33.000    F           ConvA
1         198822     2015-06-19 02:07:01.000    H           ConvA
0         253547     2015-06-20 07:52:33.000    A           ConvD
1         253547     2015-06-20 07:52:33.000    H           ConvD
2         351754     2015-06-20 07:52:33.000    J           
2         351754     2015-06-20 07:52:33.000    A           

Where IsLead indicates if a row is going to convert, where 0 is an interaction on the path, and where 1 is the actual conversion point. 2 indicates that the path is not going to convert.

ID Path indicates the unique path. so every path with a 0 has to contain a 1. and every path with a 2 only contains a 2.

LogTime indicates the time for the interaction.

PathCode indicates the type of interaction. where H indicates the interaction where a conversion happens, so IsLead 1 is always PathCode H, and indicates that the ID Path is done.

Conversion indicates at what conversion point that the conversion has happened.

The rowsare sorted to ensure that you follow each ID paths path and they are not intertwining across eachother

I would like to alter my data.frame so it looks like this:

ID Path    Lead    Path    Conversion
198822     1       JFH     ConvA
253547     1       AH      ConvD
351754     0       JA      

So what has happened is that for each ID Path, the PathCode has been merged in the correct order. And for each path with a conversion the LEAD is 1, and 0 if there is no conversion.

If possible I would prefer to have the Path column displayed WITHOUT the "H", so the Path will in this case be: "JF", "A", "JA".

like image 354
KhalidN Avatar asked Nov 20 '25 14:11

KhalidN


1 Answers

Here's a possible data.table solution (I'm assuming the data is already sorted, of not, you can add order(LogTime) to the ith expression)

library(data.table)
setDT(df)[, .(Lead = +all(Conversion != ''), 
              Path = gsub('H', "", paste(PathCode, collapse = ""), fixed = TRUE)), 
          by = .(ID.Path, Conversion)]

#    ID.Path Conversion Lead Path
# 1:  198822      ConvA    1   JF
# 2:  253547      ConvD    1    A
# 3:  351754               0   JA

Or similarly with dplyr

library(dplyr)
df %>%
  group_by(ID.Path, Conversion) %>%
  summarise(Lead = +all(Conversion != ''),
            Path = paste(PathCode, collapse = "")) %>%
  mutate(Path = gsub('H', "", Path, fixed = TRUE))

# Source: local data frame [3 x 4]
# Groups: ID.Path
# 
#   ID.Path Conversion Lead Path
# 1  198822      ConvA    1   JF
# 2  253547      ConvD    1    A
# 3  351754               0   JA
like image 196
David Arenburg Avatar answered Nov 22 '25 04:11

David Arenburg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!