I found it a bit tricky to sort some rows in my data frame in R. The data has the following structure.
df <-
ID TIME DVID AMT DV PERIOD
1 0 0 50 NA 1
1 0.5 1 0 10 1
1 0.5 2 0 15 1
2 600 0 100 NA 2
2 600 1 0 2.5 2
2 600 2 0 4 2
2 602 1 0 20 2
2 602 2 0 35 2
3 800 0 50 NA 3
3 800 1 0 5 3
3 800 2 0 10 3
For PERIOD >= 2
I want to swap rows for DVID to be DVID 1, 2, then 0 as shown in the output data frame below dfout
. The data has to be ordered by ID, TIME, PERIOD as well.
I tried using the orderBy
but could reach my goal. An example of using orderBy
is this.
library(doBy)
dfout <- orderBy(~ID+TIME+PERIOD, data=df)
The output should be this:
dfout <-
ID TIME DVID AMT DV PERIOD
1 0 0 50 NA 1
1 0.5 1 0 10 1
1 0.5 2 0 15 1
2 600 1 0 2.5 2
2 600 2 0 4 2
2 600 0 100 NA 2
2 602 1 0 20 2
2 602 2 0 35 2
3 800 1 0 5 3
3 800 2 0 10 3
3 800 0 50 NA 3
I would appreciate if somebody share a trick on how to do that in R.
We can do this without any group by and by doing that it should be faster.
library(dplyr)
df %>%
arrange(ID, PERIOD, TIME, replace(DVID, PERIOD > 1 & DVID==0, 3))
# ID TIME DVID AMT DV PERIOD
#1 1 0.0 0 50 NA 1
#2 1 0.5 1 0 10.0 1
#3 1 0.5 2 0 15.0 1
#4 2 600.0 1 0 2.5 2
#5 2 600.0 2 0 4.0 2
#6 2 600.0 0 100 NA 2
#7 2 602.0 1 0 20.0 2
#8 2 602.0 2 0 35.0 2
#9 3 800.0 1 0 5.0 3
#10 3 800.0 2 0 10.0 3
#11 3 800.0 0 50 NA 3
Or using data.table
library(data.table)
setDT(df)[PERIOD > 1 & DVID==0, DVID1:= 3][order(ID, TIME, PERIOD,
pmax(DVID, DVID1, na.rm=TRUE))][, DVID1 := NULL][]
Or using base R
df[order(df$ID, df$PERIOD, df$TIME, with(df, replace(DVID,PERIOD>1 & DVID ==0, 3))),]
# ID TIME DVID AMT DV PERIOD
#1 1 0.0 0 50 NA 1
#2 1 0.5 1 0 10.0 1
#3 1 0.5 2 0 15.0 1
#5 2 600.0 1 0 2.5 2
#6 2 600.0 2 0 4.0 2
#4 2 600.0 0 100 NA 2
#7 2 602.0 1 0 20.0 2
#8 2 602.0 2 0 35.0 2
#10 3 800.0 1 0 5.0 3
#11 3 800.0 2 0 10.0 3
#9 3 800.0 0 50 NA 3
Here is a working option by specifying the 0
to be 3
when Period
is larger than 1 and then order by the DVID
group by PERIOD
, ID
and TIME
using data.table
package:
library(data.table)
setDT(df)
df[,.SD[order(ifelse(PERIOD > 1 & DVID == 0, 3, DVID))], .(PERIOD, ID, TIME)]
# PERIOD ID TIME DVID AMT DV
# 1: 1 1 0.0 0 50 NA
# 2: 1 1 0.5 1 0 10.0
# 3: 1 1 0.5 2 0 15.0
# 4: 2 2 600.0 1 0 2.5
# 5: 2 2 600.0 2 0 4.0
# 6: 2 2 600.0 0 100 NA
# 7: 2 2 602.0 1 0 20.0
# 8: 2 2 602.0 2 0 35.0
# 9: 3 3 800.0 1 0 5.0
#10: 3 3 800.0 2 0 10.0
#11: 3 3 800.0 0 50 NA
Another way to avoid the ifelse
statement:
df[,.SD[order(((PERIOD > 1)*2 + DVID)%%3)], .(PERIOD, ID, TIME)]
# PERIOD ID TIME DVID AMT DV
# 1: 1 1 0.0 0 50 NA
# 2: 1 1 0.5 1 0 10.0
# 3: 1 1 0.5 2 0 15.0
# 4: 2 2 600.0 1 0 2.5
# 5: 2 2 600.0 2 0 4.0
# 6: 2 2 600.0 0 100 NA
# 7: 2 2 602.0 1 0 20.0
# 8: 2 2 602.0 2 0 35.0
# 9: 3 3 800.0 1 0 5.0
# 10: 3 3 800.0 2 0 10.0
# 11: 3 3 800.0 0 50 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With