Extract substring in R using grepl

Tags:

I have a table with a string column formatted like this

abcdWorkstart.csv
abcdWorkcomplete.csv

And I would like to extract the last word in that filename. So I think the beginning pattern would be the word "Work" and ending pattern would be ".csv". I wrote something using grepl but not working.

grepl("Work{*}.csv", data$filename)

Basically I want to extract whatever between Work and .csv

desired outcome:

start
complete

603

asked Aug 28 '18 14:08

ajax2000

2 Answers

I think you need sub or gsub (substitute/extract) instead of grepl (find if match exists). Note that when not found, it will return the entire string unmodified:

fn <- c('abcdWorkstart.csv', 'abcdWorkcomplete.csv', 'abcdNothing.csv')
out <- sub(".*Work(.*)\\.csv$", "\\1", fn)
out
# [1] "start"           "complete"        "abcdNothing.csv"

You can work around this by filtering out the unchanged ones:

out[ out != fn ]
# [1] "start"    "complete"

Or marking them invalid with NA (or something else):

out[ out == fn ] <- NA
out
# [1] "start"    "complete" NA

answered Oct 27 '22 17:10

r2evans

Here is an option using regmatches/regexpr from base R. Using a regex lookaround to match all characters that are not a . after the string 'Work', extract with regmatches

regmatches(v1, regexpr("(?<=Work)[^.]+(?=[.]csv)", v1, perl = TRUE))
#[1] "start"    "complete"

data

v1 <- c('abcdWorkstart.csv', 'abcdWorkcomplete.csv', 'abcdNothing.csv')

answered Oct 27 '22 18:10

akrun

Related questions
                            
                                How do I create a new column based on multiple conditions from multiple columns?
                            
                                R ggplot2: Add means as horizontal line in a boxplot
                            
                                R: how to remove duplicate rows by column [duplicate]
                            
                                Numerical column in Excel gets converted as logical
                            
                                How to update GitHub authentification token on Rstudio to match the new policy?
                            
                                How to create "NA" for missing data in a time series
                            
                                How to get the sum of each four rows of a matrix in R
                            
                                Get a vector of all days in a year with R
                            
                                car::scatter3d in R - labeling axis better
                            
                                Rollapply for time series
                            
                                Repeating a user-defined function using replicate() or sapply()
                            
                                How to import last 100 rows using read.csv() in R
                            
                                Password generator function in R
                            
                                How should I count the number of unique rows in a 'binary' matrix?
                            
                                strsplit with vertical bar (pipe)
                            
                                How to compute distances between centroids and data matrix (for kmeans algorithm)
                            
                                Attempt to set "sep" / "dec" ignored: Error in write.csv format - R
                            
                                apply jittering to outliers data in a boxplot with ggplot2
                            
                                Install R latest verison on ubuntu 16.04
                            
                                R: Extract columns from list of data.frames in a tibble

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract substring in R using grepl

Tags:

string

substring

dataframe

r