Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a variable as part of a regular expression in PowerShell

I want to Select-String parts of a file path starting at a string value that is contained in a variable. Let me explain this in an abstracted example.

Let's assume this path: /docs/reports/test reports/document1.docx

Using a regular expression I can get the required string like so: '^.*(?=\/test\s)'

https://regex101.com/r/6mBhLX/5

The resulting string is '/test reports/document1.docx'.

Now, for this to work I have to use the literal string 'test'. However, I would like to know how to use a variable that contains 'test', e.g. $myString.

I already looked at How do you use a variable in a regular expression?, but I couldn't figure out how to adapt this for PowerShell.

like image 510
colonel_claypoo Avatar asked May 07 '18 09:05

colonel_claypoo


2 Answers

I suggest using $([regex]::escape($myString)) inside a double quoted string literal:

$myString="[test]"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"

Or, in case you do not want to worry with additional escaping, use a regular concatenation using + operator:

$pattern = '^.*(?=/' + [regex]::escape($myString) +'\s)'

The resulting $pattern will look like ^.*(?=/\[test]\s). Since the $myString variable is a literal string, you need to escape all special regex metacharacters (with [regex]::escape()) that may be inside it for the regex engine to interpret it as literal chars.

In your case, you may use

$s = '/docs/reports/test reports/document1.docx'
$myString="test"
$pattern = "^.*(?=/$([regex]::escape($myString))\s)"
$s -replace $pattern

Result: /test reports/document1.docx

like image 186
Wiktor Stribiżew Avatar answered Oct 10 '22 02:10

Wiktor Stribiżew


Wiktor Stribiżew's helpful answer provides the crucial pointer:

Use [regex]::Escape() in order to escape a string for safe inclusion in a regex (regular expression) so that it is treated as a literal;
e.g., [regex]::Escape('$10?') yields \$10\? - the characters with special meaning to a regex were \-escaped.

However, I suggest using '...', i.e., building the regex from single-quoted strings:

$myString='test'
$regex = '^.*(?=/' + [regex]::escape($myString) + '\s)'

Using the -f operator - $regex = '^.*(?=/{0}'\s)' -f [regex]::Escape($myString) works too and is perhaps visually cleaner, but note that -f - unlike string concatenation with + - is culture-sensitive, which can lead to different results.

Using '...' strings in regex contexts in PowerShell is a good habit to form:

  • By avoiding "...", you avoid additional up-front interpretation (interpolation a.k.a expansion) of the string, which can have unexpected effects, given that $ has special meaning in both contexts: the start of a variable reference or subexpression when string-expanding, and the end-of-input marker in regexes.

  • Using "..." can be especially tricky in the replacement string of the regex-based -replace operator, where tokens such as $1 refer to capture-group results, and if you used "$1", PowerShell would try to expand a $1 variable, which presumably doesn't exist, resulting in the empty string.

like image 3
mklement0 Avatar answered Oct 10 '22 03:10

mklement0