Collapsing data frame by selecting one row per group

Tags:

I'm trying to collapse a data frame by removing all but one row from each group of rows with identical values in a particular column. In other words, the first row from each group.

For example, I'd like to convert this

Click to copy

> d = data.frame(x=c(1,1,2,4),y=c(10,11,12,13),z=c(20,19,18,17)) > d   x  y  z 1 1 10 20 2 1 11 19 3 2 12 18 4 4 13 17

Into this:

Click to copy

    x  y  z 1   1 11 19 2   2 12 18 3   4 13 17

I'm using aggregate to do this currently, but the performance is unacceptable with more data:

Click to copy

> d.ordered = d[order(-d$y),] > aggregate(d.ordered,by=list(key=d.ordered$x),FUN=function(x){x[1]})

I've tried split/unsplit with the same function argument as here, but unsplit complains about duplicate row numbers.

Is rle a possibility? Is there an R idiom to convert rle's length vector into the indices of the rows that start each run, which I can then use to pluck those rows out of the data frame?

825

asked Apr 13 '10 02:04

jkebinger

1 Answers

Maybe duplicated() can help:

Click to copy

R> d[ !duplicated(d$x), ]   x  y  z 1 1 10 20 3 2 12 18 4 4 13 17 R>

Edit Shucks, never mind. This picks the first in each block of repetitions, you wanted the last. So here is another attempt using plyr:

Click to copy

R> ddply(d, "x", function(z) tail(z,1))   x  y  z 1 1 11 19 2 2 12 18 3 4 13 17 R>

Here plyr does the hard work of finding unique subsets, looping over them and applying the supplied function -- which simply returns the last set of observations in a block z using tail(z, 1).

115

answered Nov 22 '22 05:11

Dirk Eddelbuettel

Related questions
                            
                                Detecting HTML5 Drag And Drop support in javascript
                            
                                C# Windows Form: On Close Do [Process]
                            
                                How to override default window close operation?
                            
                                groovy grape verbose
                            
                                why `java.lang.SecurityException: Prohibited package name: java` is required?
                            
                                How does one find out if a Windows service is installed using (preferably) only batch?
                            
                                Calculate autocorrelation using FFT in Matlab
                            
                                How to search text using php if ($text contains "World")
                            
                                LDAP query in python
                            
                                How do I pass a hash to subroutine?
                            
                                How to include array data in CURLOPT_POSTFIELDS? [duplicate]
                            
                                SQL datetime format to date only

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Collapsing data frame by selecting one row per group

Tags:

jkebinger

People also ask

1 Answers

Dirk Eddelbuettel

Recent Activity

Donate For Us