I have the following data frame, call it df, which is a data frame consisting in three vectors: "Name," "Age," and "ZipCode." <pre class="prettyprint"><code>df= Name Age ZipCode 1 Joe 16 60559 2 Jim 20 60637 3 Bob 64 94127 4 Joe 23 94122 5 Bob 45 25462 </code></pre> I want to delete the entire row of <code>df</code> if the <code>Name</code> in it appears fewer than 2 times in the data frame as a whole (and flexibly 3, 4, or x times). Basically keep <code>Bob</code> and <code>Joe</code> in the data frame, but delete <code>Jim</code>. How can I do this? I tried to turn it into a table: <pre class="prettyprint"><code>> table(df$Name) Bob Jim Joe 2 1 2 </code></pre> But I don't know where to go from there.

You can use <code>ave</code> like this: <pre class="prettyprint"><code>df[as.numeric(ave(df$Name, df$Name, FUN=length)) >= 2, ] # Name Age ZipCode # 1 Joe 16 60559 # 3 Bob 64 94127 # 4 Joe 23 94122 # 5 Bob 45 25462 </code></pre> This answer assumes that <code>df$Name</code> is a <code>character</code> vector, not a <code>factor</code> vector. <hr> You can also continue with <code>table</code> as follows: <pre class="prettyprint"><code>x <- table(df$Name) df[df$Name %in% names(x[x >= 2]), ] # Name Age ZipCode # 1 Joe 16 60559 # 3 Bob 64 94127 # 4 Joe 23 94122 # 5 Bob 45 25462 </code></pre>

Delete rows in data frame if entry appears fewer than x times

Tags:

dataframe

r

duplicate-removal

delete-row

I have the following data frame, call it df, which is a data frame consisting in three vectors: "Name," "Age," and "ZipCode."

df=      
  Name Age ZipCode
1  Joe  16   60559
2  Jim  20   60637
3  Bob  64   94127
4  Joe  23   94122
5  Bob  45   25462

I want to delete the entire row of df if the Name in it appears fewer than 2 times in the data frame as a whole (and flexibly 3, 4, or x times). Basically keep Bob and Joe in the data frame, but delete Jim. How can I do this?

I tried to turn it into a table:

> table(df$Name)

Bob Jim Joe 
 2   1   2

But I don't know where to go from there.

404

asked Dec 17 '13 05:12

Mon

1 Answers

You can use ave like this:

df[as.numeric(ave(df$Name, df$Name, FUN=length)) >= 2, ]
#   Name Age ZipCode
# 1  Joe  16   60559
# 3  Bob  64   94127
# 4  Joe  23   94122
# 5  Bob  45   25462

This answer assumes that df$Name is a character vector, not a factor vector.

You can also continue with table as follows:

x <- table(df$Name)
df[df$Name %in% names(x[x >= 2]), ]
#   Name Age ZipCode
# 1  Joe  16   60559
# 3  Bob  64   94127
# 4  Joe  23   94122
# 5  Bob  45   25462

141

answered Sep 30 '22 15:09

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Iterating a function through different columns of a data.frame matching a pattern in the column names
                            
                                List all combinations of factors (interactions) with no observations in a dataframe, up to a given dimension, removing redundancies
                            
                                non-central chi-square probability and non-centrality parameter
                            
                                R data.table syntax for subsetting and summarising
                            
                                mclapply not using multiple cores
                            
                                How can I call/execute an imageJ macro with R?
                            
                                Add custom lines in ggplot barplot
                            
                                data.table subsetting by NaN doesn't work
                            
                                Timeout while reading csv file from url in R
                            
                                Same method for multiple classes in R
                            
                                Fill the region between two lines with ggplot2 in R
                            
                                R's read.table equivalent in Python
                            
                                Change data.table values in one column for multiple rows
                            
                                R levelplot remove outer border (adjust plot border)
                            
                                Assign grid.arrange to object
                            
                                Read data separated with two colons in R [duplicate]
                            
                                Using backticks in a knitr document
                            
                                How to find max in the list of data frames
                            
                                Applying colours other than blue to bin2d
                            
                                How does the dot metacharacter match newline characters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With