How to subset your dataframe to only keep the first duplicate? [duplicate]

Question

I have a dataframe with multiple variables, and I am interested in how to subset it so that it only includes the first duplicate.

    >head(occurrence)
    userId        occurrence  profile.birthday profile.gender postDate count
    1 100469891698         6               47         Female 583 days     0
    2 100469891698         6               47         Female  55 days     0
    3 100469891698         6               47         Female 481 days     0
    4 100469891698         6               47         Female 583 days     0
    5 100469891698         6               47         Female 583 days     0
    6 100469891698         6               47         Female 583 days     0

Here you can see the dataframe. The 'occurrence' column counts how many times the same userId has occurred. I have tried the following code to remove duplicates:

    occurrence <- occurrence[!duplicated(occurrence$userId),]

However, this way it remove "random" duplicates. I want to keep the data which is the oldest one by postDate. So for example the first row should look something like this:

   userId        occurrence  profile.birthday profile.gender postDate count
  1 100469891698         6               47         Female 583 days     0

Thank you for your help!

Sandra Barão · Accepted Answer

Did you try order first like this:

occurrence <- occurrence[order(occurrence$userId, occurrence$postDate, decreasing=TRUE),]
occurrenceClean <- occurrence[!duplicated(occurrence$userId),]
occurrenceClean

How to subset your dataframe to only keep the first duplicate? [duplicate]

Tags:

r

duplicates

subset

eagerstudent

1 Answers

Sandra Barão

Recent Activity

Donate For Us

How to subset your dataframe to only keep the first duplicate? [duplicate]

Tags:

r

duplicates

subset

eagerstudent

1 Answers

Sandra Barão

Related questions

Recent Activity

Donate For Us