I have the following data frame, call it df, which is a data frame consisting in three vectors: "Name," "Age," and "ZipCode."
df=
Name Age ZipCode
1 Joe 16 60559
2 Jim 20 60637
3 Bob 64 94127
4 Joe 23 94122
5 Bob 45 25462
I want to delete the entire row of df
if the Name
in it appears fewer than 2 times in the data frame as a whole (and flexibly 3, 4, or x times). Basically keep Bob
and Joe
in the data frame, but delete Jim
. How can I do this?
I tried to turn it into a table:
> table(df$Name)
Bob Jim Joe
2 1 2
But I don't know where to go from there.
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Drop rows using the drop() function You can also use the pandas dataframe drop() function to delete rows based on column values. In this method, we first find the indexes of the rows we want to remove (using boolean conditioning) and then pass them to the drop() function.
To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.
You can't remove a row with del as rows returned by . loc or . iloc are copies of the DataFrame, so deleting them would have no effect to your actual data.
You can use ave
like this:
df[as.numeric(ave(df$Name, df$Name, FUN=length)) >= 2, ]
# Name Age ZipCode
# 1 Joe 16 60559
# 3 Bob 64 94127
# 4 Joe 23 94122
# 5 Bob 45 25462
This answer assumes that df$Name
is a character
vector, not a factor
vector.
You can also continue with table
as follows:
x <- table(df$Name)
df[df$Name %in% names(x[x >= 2]), ]
# Name Age ZipCode
# 1 Joe 16 60559
# 3 Bob 64 94127
# 4 Joe 23 94122
# 5 Bob 45 25462
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With