R regex to match beginning and end of string, ignoring middle




In R, how can I create the regex that matches beginning and end strings, ignoring everything between?

Specifically, how can I grep out of the following, the strings that begin with "./xl/worksheets" and end with ".xml"?

myfiles <- c("./_rels/.rels", "./xl/_rels/workbook.xml.rels", 
"./xl/workbook.xml", "./xl/worksheets/sheet4.xml", 
"./xl/worksheets/_rels/sheet1.xml.rels", "./xl/worksheets/sheet2.xml", 

I succeed with

grep("^\\./xl/worksheets", myfiles) # returns 4 5 6
grep("\\.xml$", myfiles) # returns 3 4 6

And of course I can do this:

which(grepl("^\\./xl/worksheets", myfiles) &
  grepl("\\.xml$", myfiles)) # returns 4 6

But, I can't figure how to make the wildcard between two patterns.

1 Answers

Simply adding a match all pattern .* between the start and end should work:

grep("^\\./xl/worksheets.*\\.xml$", myfiles) 
# [1] 4 6
