Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expand row given criteria

Tags:

r

duplicates

row

I would like to insert a duplicate row if a column has a given value. I have the following dataset:

dataset <- data.frame(id=c("A","A","A","A","B","B","B","B"),
             date=c('2018-05-09 11:30','2018-10-28 01:15','2018-10-28 01:30','2018-12-08 14:15','2018-05-09 11:30','2018-10-28 01:15','2018-10-28 01:30','2018-12-08 14:15'),
             amount=c(10,20,22,14,12,24,26,10)
             )

    id  date                amount
1   A   2018-05-09 11:30    10
2   A   2018-10-28 01:15    20
3   A   2018-10-28 01:30    22
4   A   2018-12-08 14:15    14
5   B   2018-05-09 11:30    12
6   B   2018-10-28 01:15    24
7   B   2018-10-28 01:30    26
8   B   2018-12-08 14:15    10

And I wish to duplicate the rows that contain a given date, and divide the amount by 2. The dates to find are:

date_change <- c('2018-10-28 01:00','2018-10-28 01:15','2018-10-28 01:30','2018-10-28 01:45')

And I should get:

    id  date                amount
1   A   2018-05-09 11:30    10
2   A   2018-10-28 01:15    10
3   A   2018-10-28 01:15    10
4   A   2018-10-28 01:30    11
5   A   2018-10-28 01:30    11
6   A   2018-12-08 14:15    14
7   B   2018-05-09 11:30    12
8   B   2018-10-28 01:15    12
9   B   2018-10-28 01:15    12
10  B   2018-10-28 01:30    13
11  B   2018-10-28 01:30    13
12  B   2018-12-08 14:15    10

I tried using expandRows available in splitstackshape. But it only shows the replicated rows.

library(splitstackshape)
fixed <- expandRows(dataset[dataset$date %in% date_change,], 2, count.is.col = FALSE)
like image 530
Paulos Avatar asked Jan 01 '23 15:01

Paulos


1 Answers

In base you can first find the places where date_change hits date with %in%. Divide their values by 2 and replicate those rows with rep.

i  <-  dataset$date %in% date_change
within(dataset, amount[i]  <- amount[i]/2)[rep(seq_len(nrow(dataset)), i+1),]
#    id             date amount
#1    A 2018-05-09 11:30     10
#2    A 2018-10-28 01:15     10
#2.1  A 2018-10-28 01:15     10
#3    A 2018-10-28 01:30     11
#3.1  A 2018-10-28 01:30     11
#4    A 2018-12-08 14:15     14
#5    B 2018-05-09 11:30     12
#6    B 2018-10-28 01:15     12
#6.1  B 2018-10-28 01:15     12
#7    B 2018-10-28 01:30     13
#7.1  B 2018-10-28 01:30     13
#8    B 2018-12-08 14:15     10

When you change your line

fixed <- expandRows(dataset[dataset$date %in% date_change,], 2, count.is.col = FALSE)

to

fixed <- splitstackshape::expandRows(dataset, dataset$date %in% date_change+1, count.is.col = FALSE)

it should do what you want. But still amount need to be divided.

like image 84
GKi Avatar answered Jan 08 '23 02:01

GKi