Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create duplicate rows based on conditions in R

I have a data.table that looks like this

dt <- data.table(ID=c("A","A","B","B"),Amount1=c(100,200,300,400),
                 Amount2=c(1500,1500,2400,2400),Dupl=c(1,0,1,0))

   ID Amount1 Amount2 Dupl
1:  A     100    1500    1
2:  A     200    1500    0
3:  B     300    2400    1
4:  B     400    2400    0

I need to duplicate each row that has a 1 in the Dupl column and replace the Amount1 value with the Amount2 value in that duplicated row. Besides that I need to give that duplicated row the value 2 in Dupl. This means it should look like this:

   ID Amount1 Amount2 Dupl
1:  A     100    1500    1
2:  A    1500    1500    2
3:  A     200    1500    0
4:  B     300    2400    1
5:  B    2400    2400    2
6:  B     400    2400    0

Any help is much appreciated! Kind regards,

Tim

like image 744
Tim_Utrecht Avatar asked Mar 10 '15 10:03

Tim_Utrecht


People also ask

How do you repeat a row of data in R?

In R, we can use rep function with seq_len and nrows to create a data frame with repeated rows.

How do you make duplicate rows?

5. Select the rows into which you want to copy the original row or rows. Right-click the selection, and then click "Insert Copied Cells." Excel inserts the repeated data into the new rows, moving the existing rows down.

Can DataFrame have duplicate rows?

DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique. In this article, you will learn how to use this method to identify the duplicate rows in a DataFrame.


2 Answers

You could try

rbind(dt,dt[Dupl==1][,c('Amount1', 'Dupl') := list(Amount2, 2)])
like image 124
akrun Avatar answered Oct 17 '22 03:10

akrun


Using dplyr

library("data.table")
library("dplyr")

#data
dt <- data.table(ID = c("A", "A", "B", "B"),
                 Amount1 = c(100, 200, 300, 400),
                 Amount2 = c(1500, 1500, 2400, 2400),
                 Dupl = c(1, 0, 1, 0))
#result
rbind(dt,
      dt %>% 
        filter(Dupl == 1) %>% 
        mutate(Dupl = 2,
               Amount1 = Amount2))

#    ID Amount1 Amount2 Dupl
# 1:  A     100    1500    1
# 2:  A     200    1500    0
# 3:  B     300    2400    1
# 4:  B     400    2400    0
# 5:  A    1500    1500    2
# 6:  B    2400    2400    2
like image 26
zx8754 Avatar answered Oct 17 '22 02:10

zx8754