Here I would like to remove entries that have only one-entry for a given city by date. So for instance, I would like to remove the New York and San Francisco entries, since they only have 1 observation on 4-11, and 4-12.
day City age
4-10 Miami 30
4-10 Miami 23
4-11 New York 24
4-12 San Francisco 30
Note Dataset is called DG
I tried using a for loop to find the days and get an idea of the number of entries per division per day, but I'm not sure how to work with arrays in R. countx =0
D = unique(DG$day)
for (i in 1:length(D))
{
for (j in 1:length(DG$age))
{
if (DG$day[j] == D{i]
{
countx[j] = 1
}
else
{
countx[j] = 0
}
}
Binded <- cbind(countx, DG)
With your sample data
DG <- read.csv(text="day,City,age
4-10,Miami,30
4-10,Miami,23
4-11,New York,24
4-12,San Francisco,30")
you could use dplyr
library(dplyr)
DG %>% group_by(day,City) %>% filter(n()>1)
or base R
DG[ave(rep(1, nrow(DG)), DG$day, DG$City, FUN=length)>1,]
both return
day City age
1 4-10 Miami 30
2 4-10 Miami 23
Or you could use data.table
(as suggested by @Frank)
library(data.table)
setDT(DG)[,if (.N>1) .SD, by=.(City,day)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With