I would like to extract specific rows from a dataframe into a new dataframe using R. I have two columns: City
and Household
. In order to detect move, I want a new dataframe with the households who have not the same city.
For example, if a household appears 3 times with at least one city differents from the others, I keep it. Otherwise, I delete the 3 rows of this household.
City Household
Paris A
Paris A
Nice A
Limoge B
Limoge B
Toulouse C
Paris C
Here, I want to keep only Household A
and Household C
.
A dplyr solution : compute the length of unique cities for each household and keep only those with length > 1
library(dplyr)
df <- data.frame(city=c("Paris","Paris","Nice","Limoge","Limoge","Toulouse","Paris"),
household =c(rep("A",3),rep("B",2),rep("C",2)))
new_df <- df %>% group_by(household) %>%
filter(n_distinct(city) > 1)
Source: local data frame [5 x 2]
Groups: household
city household
1 Paris A
2 Paris A
3 Nice A
4 Toulouse C
5 Paris C
Edit : added @shadow and @davidarenburg suggestions from the comments
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With