I have a data array that contains some information about people and projects as such:
person_id | project_id | action | time
--------------------------------------
1 | 1 | w | 1
1 | 2 | w | 2
1 | 3 | w | 2
1 | 3 | r | 3
1 | 3 | w | 4
1 | 4 | w | 4
2 | 2 | r | 2
2 | 2 | w | 3
I'd like to augment this data with a couple of more fields called "first_time" and "first_time_project" that collectively identify first time any action by that person was seen and the first time that developer saw any action on the project. In the end, the data should look like this:
person_id | project_id | action | time | first_time | first_time_project
------------------------------------------------------------------------
1 | 1 | w | 1 | 1 | 1
1 | 2 | w | 2 | 1 | 2
1 | 3 | w | 2 | 1 | 2
1 | 3 | r | 3 | 1 | 2
1 | 3 | w | 4 | 1 | 2
1 | 4 | w | 4 | 1 | 4
2 | 2 | r | 2 | 2 | 2
2 | 2 | w | 3 | 2 | 2
My naive way of doing this to write a couple of loops:
for (pid in unique(data$person_id)) {
data[data$pid==pid, "first_time"] = min(data[data$pid==pid, "time"])
for (projid in unique(data[data$pid==pid, "project_id"])) {
data[data$pid==pid & data$project_id==projid, "first_time_project"] = min(data[data$pid==pid & data$project_id==projid, "time"]
}
}
Now, it doesn't take a genius to see that this is going to be glacially slow with the doubly nested loops. However, I can't figure out a way to handle this in R. I'm kinda emulating the group by option for SQL. I know that by might be able to help, but I can't figure out how to do multiple slices.
Any hints on how to take my code from glacially slow to something a bit faster? I'd be happy with a snail right now.
Try ave
:
transform(data,
first_time = ave(time, person_id, FUN = min),
first_time_project = ave(time, person_id, project_id, drop = TRUE, FUN = min)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With