Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional Insertion of rows based on consecutive values in a column in R

Tags:

dataframe

r

dplyr

I have a dataframe where I need to insert rows between tow rows, if the value in a column changes from "A" to "B".

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

I want to insert a new row if the event "B", follows the event "A". The new row needs to be inserted between the two rows having all values equal to the row where "B" is the Event, except that the Event will be "Z".

Expected data frame

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
Z       321      Sell   27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
Z       320      Sell   27-01-2018 12:32
B       320      Sell   27-01-2018 12:32
like image 854
NinjaR Avatar asked Dec 19 '22 00:12

NinjaR


1 Answers

Here is an approach using tidyverse:

library(tidyverse)
df %>%
  mutate(lagE = lag(Event),  #create a lag Even column
         splt = ifelse(Event == "B" & lagE == "A", T, F),  #label the condition B after A
         cum = cumsum(splt)) %>% #create a column to split by
  {split(., .$cum)} %>% #split the data frame
  map(function(x){  #in each list data frame check if first element is B, if it is duplicate it and rename to Z, if not just return the data frame.
    if(x[1,1] == "B"){
      z <- rbind(x[1,], x)
      z[,1] <- as.character(z[,1])
      z[1,1] <- "Z" 
    } else {z <- x}
    z
  }) %>%
  bind_rows() %>% #put back to a data frame
  select(1:5) #remove helper columns

#output
   Event Price Type       Date  Time
1      A   100 Sell 27-01-2018 12:00
2      C   200  Buy 27-01-2018 12:15
3      C   300  Buy 27-01-2018 12:30
4      D   350 Sell 27-01-2018 12:31
5      A   320  Buy 27-01-2018 12:32
6      Z   321 Sell 27-01-2018 12:32
7      B   321 Sell 27-01-2018 12:32
8      B   220  Buy 27-01-2018 12:34
9      L   550  Buy 27-01-2018 12:35
10     A   320  Buy 27-01-2018 12:32
11     Z   320 Sell 27-01-2018 12:32
12     B   320 Sell 27-01-2018 12:32

The problem seems simple and I am sure someone will provide a more succinct solution.

like image 127
missuse Avatar answered May 08 '23 21:05

missuse