Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete rows in data frame if entry appears fewer than x times

I have the following data frame, call it df, which is a data frame consisting in three vectors: "Name," "Age," and "ZipCode."

df=      
  Name Age ZipCode
1  Joe  16   60559
2  Jim  20   60637
3  Bob  64   94127
4  Joe  23   94122
5  Bob  45   25462

I want to delete the entire row of df if the Name in it appears fewer than 2 times in the data frame as a whole (and flexibly 3, 4, or x times). Basically keep Bob and Joe in the data frame, but delete Jim. How can I do this?

I tried to turn it into a table:

> table(df$Name)

Bob Jim Joe 
 2   1   2 

But I don't know where to go from there.

like image 404
Mon Avatar asked Dec 17 '13 05:12

Mon


People also ask

How do I delete rows based on DataFrame conditions?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do I delete a row with a specific value in a DataFrame?

Drop rows using the drop() function You can also use the pandas dataframe drop() function to delete rows based on column values. In this method, we first find the indexes of the rows we want to remove (using boolean conditioning) and then pass them to the drop() function.

How do I delete multiple rows in a data frame?

To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.

Can we delete a row from DataFrame using Del function?

You can't remove a row with del as rows returned by . loc or . iloc are copies of the DataFrame, so deleting them would have no effect to your actual data.


1 Answers

You can use ave like this:

df[as.numeric(ave(df$Name, df$Name, FUN=length)) >= 2, ]
#   Name Age ZipCode
# 1  Joe  16   60559
# 3  Bob  64   94127
# 4  Joe  23   94122
# 5  Bob  45   25462

This answer assumes that df$Name is a character vector, not a factor vector.


You can also continue with table as follows:

x <- table(df$Name)
df[df$Name %in% names(x[x >= 2]), ]
#   Name Age ZipCode
# 1  Joe  16   60559
# 3  Bob  64   94127
# 4  Joe  23   94122
# 5  Bob  45   25462
like image 141
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 30 '22 15:09

A5C1D2H2I1M1N2O1R2T1