Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regex to match beginning and end of string, ignoring middle

Tags:

regex

r

In R, how can I create the regex that matches beginning and end strings, ignoring everything between?

Specifically, how can I grep out of the following, the strings that begin with "./xl/worksheets" and end with ".xml"?

myfiles <- c("./_rels/.rels", "./xl/_rels/workbook.xml.rels", 
"./xl/workbook.xml", "./xl/worksheets/sheet4.xml", 
"./xl/worksheets/_rels/sheet1.xml.rels", "./xl/worksheets/sheet2.xml", 
"./xl/printerSettings/printerSettings11.bin")

I succeed with

grep("^\\./xl/worksheets", myfiles) # returns 4 5 6
grep("\\.xml$", myfiles) # returns 3 4 6

And of course I can do this:

which(grepl("^\\./xl/worksheets", myfiles) &
  grepl("\\.xml$", myfiles)) # returns 4 6

But, I can't figure how to make the wildcard between two patterns.

like image 917
J. Win. Avatar asked Jun 03 '18 16:06

J. Win.


People also ask

What does \b mean in regex?

The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

What does \d mean in regex?

In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit. The above regex matches two words (without white spaces) separated by one or more whitespaces.

What does \+ mean in regex?

Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol.

How do I match a string pattern in r?

R Functions for Pattern Matchinggrep(pattern, string) returns by default a list of indices. If the regular expression, pattern, matches a particular element in the vector string, it returns the element's index. For returning the actual matching element values, set the option value to TRUE by value=TRUE .


1 Answers

Simply adding a match all pattern .* between the start and end should work:

grep("^\\./xl/worksheets.*\\.xml$", myfiles) 
# [1] 4 6
like image 184
Psidom Avatar answered Sep 28 '22 10:09

Psidom