Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R subset unique observation keeping last entry

Tags:

r

I have a data frame that looks something like this (with a lot more observations)

df <- structure(list(session_user_id = c("1803f6c3625c397afb4619804861f75268dfc567", 
"1924cb2ebdf29f052187b9a2d21673e4d314199b", "1924cb2ebdf29f052187b9a2d21673e4d314199b", 
"1924cb2ebdf29f052187b9a2d21673e4d314199b", "1924cb2ebdf29f052187b9a2d21673e4d314199b", 
"198b83b365fef0ed637576fe1bde786fc09817b2", "19fd8069c094fb0697508cc9646513596bea30c4", 
"19fd8069c094fb0697508cc9646513596bea30c4", "19fd8069c094fb0697508cc9646513596bea30c4", 
"19fd8069c094fb0697508cc9646513596bea30c4", "1a3d33c9cbb2aa41515e6ef76f123b2ea8ee2f13", 
"1b64c142b1540c43e3f813ccec09cb2dd7907c14", "1b7346d13f714c97725ba2e1c21b600535164291"
), raw_score = c(1, 1, 1, 1, 1, 0.2, NA, 1, 1, 1, 1, 0.2, 1), 
    submission_time = c(1389707078L, 1389694184L, 1389694188L, 
    1389694189L, 1389694194L, 1390115495L, 1389696939L, 1389696971L, 
    1389741306L, 1389985033L, 1389983862L, 1389854836L, 1389692240L
    )), .Names = c("session_user_id", "raw_score", "submission_time"
), row.names = 28:40, class = "data.frame")

I want to create a new data frame with only one observation per "session_ user_id" by keeping the one with the latest "submission_time."

The only idea that I have in mind is to create a list of unique users. Write a loop to find the max of submission_time for each user and then write a loop that gets raw score fore that user and time.

Can somebody show me a better way of doing this in R?

Thanks!

like image 630
Ignacio Avatar asked Feb 11 '14 14:02

Ignacio


People also ask

How do I keep unique values in a column in R?

To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.

How do I extract the last row in R?

The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.

How do I get the last row of each group in R?

Use the dplyr filter function to get the first and the last row of each group. This is a combination of duplicates removal that leaves the first and last row at the same time.


1 Answers

You could first order your data.frame by submission_time and remove all duplicated session_user_id entries afterwards:

## order by submission_time
df <- df[order(df$submission_time, decreasing=TRUE),]

## remove duplicated user_id
df <- df[!duplicated(df$session_user_id),]

#                            session_user_id raw_score submission_time
#33 198b83b365fef0ed637576fe1bde786fc09817b2       0.2      1390115495
#37 19fd8069c094fb0697508cc9646513596bea30c4       1.0      1389985033
#38 1a3d33c9cbb2aa41515e6ef76f123b2ea8ee2f13       1.0      1389983862
#39 1b64c142b1540c43e3f813ccec09cb2dd7907c14       0.2      1389854836
#28 1803f6c3625c397afb4619804861f75268dfc567       1.0      1389707078
#32 1924cb2ebdf29f052187b9a2d21673e4d314199b       1.0      1389694194
#40 1b7346d13f714c97725ba2e1c21b600535164291       1.0      1389692240
like image 82
sgibb Avatar answered Oct 04 '22 00:10

sgibb