Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Wildcard in the middle of an expression

Tags:

regex

r

I want to use the pattern expression in R to find files in my directory that match "ReportName*.HTML". Meaning that I only want to find files with certain file names and extensions, but there are dynamic characters between.

Here's an example: I want to find all reports that begin with "2016 Operations" but end with the extension ".HTML". Currently I am trying:

files.control <- dir(path, pattern="^2016 Operations*.HTML$")

Why doesn't this work? I like the one line of code; it's so simple.

like image 500
Pablo Boswell Avatar asked Mar 25 '16 19:03

Pablo Boswell


1 Answers

The "ReportName*.HTML" syntax is called a glob and is supported in R via the following which will return a character vector of the current directory filenames starting with ReportName and ending with .HTML.

Sys.glob("ReportName*.HTML")

The R function glob2rx will translate globs to regular expressions so this does the same thing:

dir(pattern = glob2rx("ReportName*.HTML"))

We can discover the regular expression associated with a glob like this:

glob2rx("ReportName*.HTML")
## [1] "^ReportName.*\\.HTML$"

and you can find more information on regular expressions from within R via help using ?regex and more info at the links near the bottom of this page: https://code.google.com/archive/p/gsubfn/

like image 69
G. Grothendieck Avatar answered Sep 28 '22 14:09

G. Grothendieck