Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using grep in R to delete rows from a data.frame

Tags:

dataframe

r

row

I have a dataframe such as this one:

    d <- data.frame(cbind(x=1, y=1:10,    z=c("apple","pear","banana","A","B","C","D","E","F","G")), stringsAsFactors = FALSE)

I'd like to delete some rows from this dataframe, depending on the content of column z:

    new_d <- d[-grep("D",d$z),]

This works fine; row 7 is now deleted:

    new_d
     x  y      z
  1  1  1  apple
  2  1  2   pear
  3  1  3 banana
  4  1  4      A
  5  1  5      B
  6  1  6      C
  8  1  8      E
  9  1  9      F
  10 1 10      G

However, when I use grep to search for content that is not present in column z, it seems to delete all content of the dataframe:

    new_d <- d[-grep("K",d$z),]
    new_d
    [1] x y z
    <0 rows> (or 0-length row.names)

I would like to search and delete rows in this or another way, even if the character string I am searching for is not present. How to go about this?

like image 386
Annemarie Avatar asked Jul 18 '12 14:07

Annemarie


People also ask

How do I remove a row from a DataFrame in R?

To remove the rows in R, use the subsetting in R. There is no built-in function of removing a row from the data frame, but you can access a data frame without some rows specified by the negative index. This process is also called subsetting. This way, you can remove unwanted rows from the data frame.

How do I remove a string from a DataFrame in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.

How do I remove a row from a data frame in R?

R: Remove Rows from Data Frame Based on Condition You can use the subset () function to remove rows with certain values in a data frame in R: #only keep rows where col1 value is less than 10 and col2 value is less than 8 new_df <- subset (df, col1<10 & col2<8)

How to use grepl function in R to subset rows?

The grepl function in R search for matches to argument pattern within each element of a character vector or column of an R data frame. If we want to subset rows of an R data frame using grepl then subsetting with single-square brackets and grepl can be used by accessing the column that contains character values. Consider the below data frame:

How to delete multiple data frames from your current R workspace?

The following code shows how to delete multiple data frames from your current R workspace: The following code shows how to delete all objects that are of type “data.frame” in your current R workspace: You can also use the grepl () function to delete all objects in the workspace that contain the phrase “df”:

How do I add multiple rows to a R data frame?

You also have the option of using rbind to add multiple rows at once – or even combine two R data frames. If you want to add rows this way, the two data frames need to have the same number of columns.


2 Answers

You can use TRUE/FALSE subsetting instead of numeric.

grepl is like grep, but it returns a logical vector. Negation works with it.

 d[!grepl("K",d$z),]
   x  y      z
1  1  1  apple
2  1  2   pear
3  1  3 banana
4  1  4      A
5  1  5      B
6  1  6      C
7  1  7      D
8  1  8      E
9  1  9      F
10 1 10      G
like image 78
GSee Avatar answered Sep 20 '22 15:09

GSee


Here's your problem:

> grep("K",c("apple","pear","banana","A","B","C","D","E","F","G"))
integer(0)

Try grepl() instead:

d[!grepl("K",d$z),]

This works because the negated logical vector has an entry for every row:

> grepl("K",d$z)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> !grepl("K",d$z)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
like image 41
Ari B. Friedman Avatar answered Sep 22 '22 15:09

Ari B. Friedman