R subset unique observation keeping last entry

Q: How do I get the last row of each group in R?

Use the dplyr filter function to get the first and the last row of each group. This is a combination of duplicates removal that leaves the first and last row at the same time.

Tags:

r

I have a data frame that looks something like this (with a lot more observations)

df <- structure(list(session_user_id = c("1803f6c3625c397afb4619804861f75268dfc567", 
"1924cb2ebdf29f052187b9a2d21673e4d314199b", "1924cb2ebdf29f052187b9a2d21673e4d314199b", 
"1924cb2ebdf29f052187b9a2d21673e4d314199b", "1924cb2ebdf29f052187b9a2d21673e4d314199b", 
"198b83b365fef0ed637576fe1bde786fc09817b2", "19fd8069c094fb0697508cc9646513596bea30c4", 
"19fd8069c094fb0697508cc9646513596bea30c4", "19fd8069c094fb0697508cc9646513596bea30c4", 
"19fd8069c094fb0697508cc9646513596bea30c4", "1a3d33c9cbb2aa41515e6ef76f123b2ea8ee2f13", 
"1b64c142b1540c43e3f813ccec09cb2dd7907c14", "1b7346d13f714c97725ba2e1c21b600535164291"
), raw_score = c(1, 1, 1, 1, 1, 0.2, NA, 1, 1, 1, 1, 0.2, 1), 
    submission_time = c(1389707078L, 1389694184L, 1389694188L, 
    1389694189L, 1389694194L, 1390115495L, 1389696939L, 1389696971L, 
    1389741306L, 1389985033L, 1389983862L, 1389854836L, 1389692240L
    )), .Names = c("session_user_id", "raw_score", "submission_time"
), row.names = 28:40, class = "data.frame")

I want to create a new data frame with only one observation per "session_ user_id" by keeping the one with the latest "submission_time."

The only idea that I have in mind is to create a list of unique users. Write a loop to find the max of submission_time for each user and then write a loop that gets raw score fore that user and time.

Can somebody show me a better way of doing this in R?

Thanks!

630

asked Feb 11 '14 14:02

Ignacio

1 Answers

You could first order your data.frame by submission_time and remove all duplicated session_user_id entries afterwards:

## order by submission_time
df <- df[order(df$submission_time, decreasing=TRUE),]

## remove duplicated user_id
df <- df[!duplicated(df$session_user_id),]

#                            session_user_id raw_score submission_time
#33 198b83b365fef0ed637576fe1bde786fc09817b2       0.2      1390115495
#37 19fd8069c094fb0697508cc9646513596bea30c4       1.0      1389985033
#38 1a3d33c9cbb2aa41515e6ef76f123b2ea8ee2f13       1.0      1389983862
#39 1b64c142b1540c43e3f813ccec09cb2dd7907c14       0.2      1389854836
#28 1803f6c3625c397afb4619804861f75268dfc567       1.0      1389707078
#32 1924cb2ebdf29f052187b9a2d21673e4d314199b       1.0      1389694194
#40 1b7346d13f714c97725ba2e1c21b600535164291       1.0      1389692240

answered Oct 04 '22 00:10

sgibb

Related questions
                            
                                Create a rainbow color scale based on a vector, in the order of that vector
                            
                                split a character from a number with multiple digits
                            
                                rbind vectors of different length: pad with zero (or NA) instead of recycling
                            
                                Extract URLs with regex into a new data frame column
                            
                                Find nearest points of latitude and longitude from different data sets with different length
                            
                                How to download an .xlsx file in R and load the data into a dataframe?
                            
                                RPostgreSQL Cannot Close Connections
                            
                                Dynamically creating tabs with plots in shiny without re-creating existing tabs
                            
                                Count number of values in row using dplyr
                            
                                Transform complex data frame
                            
                                How can I split rows up by the number of times located in a column in R?
                            
                                Concatenate several columns to comma separated strings by group
                            
                                lapply and do.call running very slowly?
                            
                                A basic R function
                            
                                Subset dataframe with list of columns in R
                            
                                Better way to visualize complicated data
                            
                                How to know a dimension of matrix or vector in R?
                            
                                Draw lines between all the coordinates in a plot
                            
                                Percentile for Each Observation w/r/t Grouping Variable
                            
                                How to convert yyyy-mm-dd to day of the year in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With