Filtering observations in dplyr in combination with grepl

Tags:

I am trying to work out how to filter some observations from a large dataset using dplyr and grepl . I am not wedded to grepl, if other solutions would be more optimal.

Take this sample df:

df1 <- data.frame(fruit=c("apple", "orange", "xapple", "xorange",                            "applexx", "orangexx", "banxana", "appxxle"), group=c("A", "B") ) df1   #     fruit group #1    apple     A #2   orange     B #3   xapple     A #4  xorange     B #5  applexx     A #6 orangexx     B #7  banxana     A #8  appxxle     B

I want to:

filter out those cases beginning with 'x'
filter out those cases ending with 'xx'

I have managed to work out how to get rid of everything that contains 'x' or 'xx', but not beginning with or ending with. Here is how to get rid of everything with 'xx' inside (not just ending with):

df1 %>%  filter(!grepl("xx",fruit))  #    fruit group #1   apple     A #2  orange     B #3  xapple     A #4 xorange     B #5 banxana     A

This obviously 'erroneously' (from my point of view) filtered 'appxxle'.

I have never fully got to grips with regular expressions. I've been trying to modify code such as: grepl("^(?!x).*$", df1$fruit, perl = TRUE) to try and make it work within the filter command, but am not quite getting it.

Expected output:

#      fruit group #1     apple     A #2    orange     B #3   banxana     A #4   appxxle     B

I'd like to do this inside dplyr if possible.

459

asked Sep 23 '14 15:09

jalapic

1 Answers

I didn't understand your second regex, but this more basic regex seems to do the trick:

df1 %>% filter(!grepl("^x|xx$", fruit)) ###     fruit group 1   apple     A 2  orange     B 3 banxana     A 4 appxxle     B

And I assume you know this, but you don't have to use dplyr here at all:

df1[!grepl("^x|xx$", df1$fruit), ] ###     fruit group 1   apple     A 2  orange     B 7 banxana     A 8 appxxle     B

The regex is looking for strings that start with x OR end with xx. The ^ and $ are regex anchors for the beginning and ending of the string respectively. | is the OR operator. We're negating the results of grepl with the ! so we're finding strings that don't match what's inside the regex.

answered Sep 21 '22 05:09

Chase

Related questions
                            
                                Alternative to expand.grid for data.frames
                            
                                Detect text language in R
                            
                                Performant 2D OpenGL graphics in R for fast display of raster image using qtpaint (qt) or rdyncall (SDL/OpenGL) packages?
                            
                                Saving and incrementally updating nearest-neighbor model in R
                            
                                Difference between Rscript and littler
                            
                                knitr gets tricked by data.table `:=` assignment
                            
                                Is there a more elegant way to convert two-digit years to four-digit years with lubridate?
                            
                                Variable as a column name in data frame
                            
                                Uninstall (remove) R package with dependencies
                            
                                How to write a test for a ggplot plot
                            
                                How is the feature score(/importance) in the XGBoost package calculated?
                            
                                Python's equivalent for R's dput() function
                            
                                Specifying multiple simultaneous output formats in knitr
                            
                                Change temporary directory
                            
                                Security in an R Shiny Application
                            
                                Understanding how .Internal C functions are handled in R
                            
                                Closest equivalent of a factor variable in Python Pandas
                            
                                Align geom_text to a geom_vline in ggplot2
                            
                                What is the default font for ggplot2
                            
                                efficiently generate a random sample of times and dates between two dates

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Filtering observations in dplyr in combination with grepl

Tags:

r

filter

dplyr

grepl

jalapic

People also ask

1 Answers

Chase

Recent Activity

Donate For Us