Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I order an R data frame based on request id and previous request id?

Tags:

r

I have an R data frame which looks like:

User |request_id |previous_request_id
-------------------------------------
A    |9          |5
A    |3          |1
A    |5          |NA
A    |1          |9
B    |2          |8
B    |8          |7
B    |7          |NA
B    |4          |2

Each row corresponds to a request a particular user made. Each row has a user ID, a request ID and the ID of their previous request. Where there is no previous request the previous_request_id field is NA.

For each user I want to order each request by using the previous request id, with:

  • The order being 1 if the previous_request_id is NA
  • The order being 2 if the previous_request_id is equal to a request_id with an order of 1
  • The order being 3 if the previous_request_id is equal to a request_id with an order of 2
  • etc.

The result of the above rules applied to the first table should look like:

User |request_id |previous_request_id |Order
---------------------------------------------
A    |9          |5                   |2
A    |3          |1                   |4
A    |5          |NA                  |1
A    |1          |9                   |3
B    |2          |8                   |3
B    |8          |7                   |2
B    |7          |NA                  |1
B    |4          |2                   |4

Is there a way to do this within R? I believe a graphical database package may be the way to do this but so far I haven't been able to find anything in my research (centred on the Cypher language of Neo4j).

Any help here would be greatly appreciated!

like image 882
shancrane Avatar asked Jun 03 '15 10:06

shancrane


People also ask

How do you order variables in R?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

How do I refer to a specific column in R?

Use the $ operator to address a column by name.

How do you access data frames in R?

Accessing the columns of a data frame The column items in a data frame in R can be accessed using: Single brackets [] , which would display them as a column. Double brackets [[]] , which would display them as a list. Dollar symbol $ , which would display them as a list.

How do you find the number of rows and the number of columns by a single command in R?

The ncol() function in R programming R programming helps us with ncol() function by which we can get the information on the count of the columns of the object. That is, ncol() function returns the total number of columns present in the object.


1 Answers

There are many ways to do this, but here's what I came up with...

df <- read.delim(text="User|request_id|previous_request_id
A|9|5
A|3|1
A|5|NA
A|1|9
B|2|8
B|8|7
B|7|NA
B|4|2", sep="|")

df$order <- rep(NA, nrow(df))
df$order[is.na(df$previous_request_id)] <- 1
df$order[df$order[match(df$previous_request_id, df$request_id)] == 1] <- 2
df$order[df$order[match(df$previous_request_id, df$request_id)] == 2] <- 3
df$order[df$order[match(df$previous_request_id, df$request_id)] == 3] <- 4

But notice that we are repeating the same code (almost) over and over. We can create a loop to shorten the code up a bit...

max_user_len <- max(table(df$User))
df$order <- rep(NA, nrow(df))
df$order[is.na(df$previous_request_id)] <- 1
sapply(1:max_user_len, function(x)df$order[df$order[match(df$previous_request_id, df$request_id)] == x] <<- x+1)
> df$order
[1] 2 4 1 3 3 2 1 4
like image 74
cory Avatar answered Oct 28 '22 16:10

cory