How do I order an R data frame based on request id and previous request id?

Tags:

r

I have an R data frame which looks like:

User |request_id |previous_request_id
-------------------------------------
A    |9          |5
A    |3          |1
A    |5          |NA
A    |1          |9
B    |2          |8
B    |8          |7
B    |7          |NA
B    |4          |2

Each row corresponds to a request a particular user made. Each row has a user ID, a request ID and the ID of their previous request. Where there is no previous request the previous_request_id field is NA.

For each user I want to order each request by using the previous request id, with:

The order being 1 if the previous_request_id is NA
The order being 2 if the previous_request_id is equal to a request_id with an order of 1
The order being 3 if the previous_request_id is equal to a request_id with an order of 2
etc.

The result of the above rules applied to the first table should look like:

User |request_id |previous_request_id |Order
---------------------------------------------
A    |9          |5                   |2
A    |3          |1                   |4
A    |5          |NA                  |1
A    |1          |9                   |3
B    |2          |8                   |3
B    |8          |7                   |2
B    |7          |NA                  |1
B    |4          |2                   |4

Is there a way to do this within R? I believe a graphical database package may be the way to do this but so far I haven't been able to find anything in my research (centred on the Cypher language of Neo4j).

Any help here would be greatly appreciated!

882

asked Jun 03 '15 10:06

shancrane

1 Answers

There are many ways to do this, but here's what I came up with...

df <- read.delim(text="User|request_id|previous_request_id
A|9|5
A|3|1
A|5|NA
A|1|9
B|2|8
B|8|7
B|7|NA
B|4|2", sep="|")

df$order <- rep(NA, nrow(df))
df$order[is.na(df$previous_request_id)] <- 1
df$order[df$order[match(df$previous_request_id, df$request_id)] == 1] <- 2
df$order[df$order[match(df$previous_request_id, df$request_id)] == 2] <- 3
df$order[df$order[match(df$previous_request_id, df$request_id)] == 3] <- 4

But notice that we are repeating the same code (almost) over and over. We can create a loop to shorten the code up a bit...

max_user_len <- max(table(df$User))
df$order <- rep(NA, nrow(df))
df$order[is.na(df$previous_request_id)] <- 1
sapply(1:max_user_len, function(x)df$order[df$order[match(df$previous_request_id, df$request_id)] == x] <<- x+1)
> df$order
[1] 2 4 1 3 3 2 1 4

answered Oct 28 '22 16:10

cory

Related questions
                            
                                Use tm's Corpus function with big data in R
                            
                                Remove blank lines from geom_tile
                            
                                Rcpp: Error: not compatible with requested type
                            
                                dplyr 0.3.0.9000 how to use do() correctly
                            
                                calendar heat map tetris chart
                            
                                Control execution flow following the update of several reactive dependencies
                            
                                slidify (io2012): how to change the slide size according to screen resolution?
                            
                                Calculation time !=
                            
                                R - Check if File is Open/Closed and by which user
                            
                                Change default browser in shiny R
                            
                                Shiny tabPanel and Google Analytics
                            
                                Why data.table CJ doesn't respect column major order
                            
                                How to map a column through a dictionary in R
                            
                                Output graph to a two page PDF
                            
                                R: Window Function "Start" after "End"
                            
                                RSelenium error: NotConnectedException
                            
                                plot got cut off after saving to file
                            
                                In R subsetting without using subset() and use [ in a more concise manner to prevent typos?
                            
                                I need to plot some 2d data through time (thus 3d)
                            
                                Produce multiple ggplot figures within one ggplot()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With