Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete entries with only one observation in a group

Tags:

r

Here I would like to remove entries that have only one-entry for a given city by date. So for instance, I would like to remove the New York and San Francisco entries, since they only have 1 observation on 4-11, and 4-12.

day                          City                  age
4-10                        Miami                   30
4-10                        Miami                   23
4-11                        New York                24
4-12                        San Francisco           30

Note Dataset is called DG

I tried using a for loop to find the days and get an idea of the number of entries per division per day, but I'm not sure how to work with arrays in R. countx =0

D = unique(DG$day)
for (i in 1:length(D))
{
    for (j in 1:length(DG$age))
    {
      if (DG$day[j] == D{i]
      {
      countx[j] = 1
      }
      else
      {
      countx[j] = 0
      }
    }
Binded <- cbind(countx, DG)
like image 782
steppermotor Avatar asked Jul 17 '15 04:07

steppermotor


Video Answer


1 Answers

With your sample data

DG <- read.csv(text="day,City,age
4-10,Miami,30
4-10,Miami,23
4-11,New York,24
4-12,San Francisco,30")

you could use dplyr

library(dplyr)
DG %>% group_by(day,City) %>% filter(n()>1)

or base R

DG[ave(rep(1, nrow(DG)), DG$day, DG$City, FUN=length)>1,]

both return

   day  City age
1 4-10 Miami  30
2 4-10 Miami  23

Or you could use data.table (as suggested by @Frank)

library(data.table)
setDT(DG)[,if (.N>1) .SD, by=.(City,day)]
like image 162
MrFlick Avatar answered Oct 26 '22 13:10

MrFlick