I would like to extract specific rows from a dataframe into a new dataframe using R. I have two columns: City and Household. In order to detect move, I want a new dataframe with the households who have not the same city.
For example, if a household appears 3 times with at least one city differents from the others, I keep it. Otherwise, I delete the 3 rows of this household.
    City      Household
   Paris              A
   Paris              A
    Nice              A
  Limoge              B
  Limoge              B
Toulouse              C
   Paris              C
Here, I want to keep only Household A and Household C.
A dplyr solution : compute the length of unique cities for each household and keep only those with length > 1
library(dplyr)
df <- data.frame(city=c("Paris","Paris","Nice","Limoge","Limoge","Toulouse","Paris"),
                 household =c(rep("A",3),rep("B",2),rep("C",2)))
new_df <- df %>% group_by(household) %>%
  filter(n_distinct(city) > 1)
Source: local data frame [5 x 2]
Groups: household
      city household
1    Paris         A
2    Paris         A
3     Nice         A
4 Toulouse         C
5    Paris         C
Edit : added @shadow and @davidarenburg suggestions from the comments
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With