Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning first row of group

Tags:

r

aggregate

plyr

I have a dataframe consisting of an ID, that is the same for each element in a group, two datetimes and the time interval between these two. One of the datetime objects is my relevant time marker. Now I like to get a subset of the dataframe that consists of the earliest entry for each group. The entries (especially the time interval) need to stay untouched.

My first approach was to sort the frame according to 1. ID and 2. relevant datetime. However, I wasn't able to return the first entry for each new group.

I then have been looking at the aggregate() as well as ddply() function but I could not find an option in both that just returns the first entry without applying an aggregation function to the time interval value.

Is there an (easy) way to accomplish this?

ADDITION: Maybe I was unclear by adding my aggregate() and ddply() notes. I do not necessarily need to aggregate. Given the fact that the dataframe is sorted in a way that the first row of each new group is the row I am looking for, it would suffice to just return a subset with each row that has a different ID than the one before (which is the start-row of each new group).

Example data:

structure(list(ID = c(1454L, 1322L, 1454L, 1454L, 1855L, 1669L, 
1727L, 1727L, 1488L), Line = structure(c(2L, 1L, 3L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"), 
    Start = structure(c(1357038060, 1357221074, 1357369644, 1357834170, 
    1357913412, 1358151763, 1358691675, 1358789411, 1359538400
    ), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1357110430, 
    1357365312, 1357564413, 1358230679, 1357978810, 1358674600, 
    1358853933, 1359531923, 1359568151), class = c("POSIXct", 
    "POSIXt"), tzone = ""), Interval = c(1206.16666666667, 2403.96666666667, 
    3246.15, 6608.48333333333, 1089.96666666667, 8713.95, 2704.3, 
    12375.2, 495.85)), .Names = c("ID", "Line", "Start", "End", 
"Interval"), row.names = c(NA, -9L), class = "data.frame")
like image 780
fr3d-5 Avatar asked Oct 18 '13 13:10

fr3d-5


1 Answers

As you don't provide any data, here is an example using base R with a sample data frame :

df <- data.frame(group=c("a", "b"), value=1:8)
## Order the data frame with the variable of interest
df <- df[order(df$value),]
## Aggregate
aggregate(df, list(df$group), FUN=head, 1)

EDIT : As Ananda suggests in his comment, the following call to aggregate is better :

aggregate(.~group, df, FUN=head, 1)

If you prefer to use plyr, you can replace aggregate with ddply :

ddply(df, "group", head, 1)
like image 101
juba Avatar answered Sep 19 '22 15:09

juba