Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PowerShell regex -match only matching once

I'm trying to run through files named using alphabetical dates, and set file date-times accordingly. My code works fine, and I was ready to consider it complete, when I noticed this issue. My code should detect two dates and generate an error, but it doesn't. I've extracted the relevant code, and recreated the issue:

$str = "test20001231 20170415.txt"
$match = ($str -match "(?<=\b|\D)20\d{6,6}(?=\b|\D)")
"$match"
"$($Matches.length)"
"$($Matches[0].ToString())"

Gives this output:

True
1
20001231

My understanding of the regex code is that it should match everything that is an 8 digit number beginning with 20, wherever it is in the string, unless it is following or preceding another digit. So I am expecting $Matches.length to be 2.

I've tested the regex code in a number of places online, and it matches the two dates as I expect: http://regexstorm.net/tester?p=%28%3f%3c%3d%5cb%7c%5cD%2920%5cd%7b6%2c6%7d%28%3f%3d%5cb%7c%5cD%29&i=test20001231+20170415.txt http://www.phpliveregex.com/p/jLA

The issue applies to PS and PS ISE. I've searches lots (I think), and not turned up anything helpful. Any suggestions? Many thanks in advance, Dave

like image 960
Davii Avatar asked Dec 18 '22 08:12

Davii


1 Answers

PowerShell's -match operator only ever looks for the first match (if any) per input string, because its purpose is to test for a (any) match, irrespective of whether there is more than one.

Note that a single -match expression can have multiple input strings, if the LHS is an array, in which case an array of the elements that match is returned; e.g.: 'foo', 'bar', 'baz' -match 'b' yields array 'bar', 'baz'. However, for each array element only a single match is again tested for, and the automatic $Matches variables is not populated in this case - see bottom.

All commands below assume PSv3+, but could be made to work in v2 too.

You need to use the .NET framework's [regex] class to get multiple matches:

PS> ([regex]::Matches('test20001231 20170415.txt', '(?<=\b|\D)20\d{6,6}(?=\b|\D)')).Value
20001231
20170415

[regex]::Matches() outputs a collection of [System.Text.RegularExpressions.Match] instances[1] whose .Value properties contain the matches.

Note how .Value is applied to the entire collection, which in PSv3+ automatically returns the property values of the collection members as an array.

To get just the count of matches:

PS> ([regex]::Matches('test20001231 20170415.txt', '(?<=\b|\D)20\d{6,6}(?=\b|\D)')).Count
2

Another option is to use Select-String -AllMatches, which outputs [Microsoft.PowerShell.Commands.MatchInfo] instances whose .Matches property contains each line's collection of [System.Text.RegularExpressions.Match] instances:

PS> ('test20001231 20170415.txt' |
    Select-String -AllMatches '(?<=\b|\D)20\d{6,6}(?=\b|\D)').Matches.Value
20001231
20170415

As above, substituting .Count for .Value outputs the number of matches.

Note that use of Select-String is a bit heavy-handed for use with a single input string, but it's the right tool to use for large input collections, such as a file's lines.


Optional reading: The automatic $Matches variable:

The automatic $Matches variable is populated (as of PSv5.1):

  • only when you use the -match operator
  • and the LHS is a scalar
    • by contrast, with an array on the LHS, $Matches is neither populated nor reset.
  • and a match is found (-match returns $true)
    • if no match is found (-match returns $false), a preexisting $Matches value, if any, is left untouched.

$Matches is a [hashtable] instance with the following entries:

  • key 0's entry is the entire match - this key is by definition always present.
  • key <n>'s entry is what the - unnamed - capture group with index <n> matched.
  • key <name>'s entry is what named capture group <name> matched.

The fact that $Matches (potentially) also contains capture-group values justifies its plural name - despite only relating to a single match of the given regex.


[1] To inspect the type of a single object or the type of the elements of a collection, pipe to Get-Member: ([regex]::Matches('foo', 'o')) | Get-Member
To inspect the type of a collection itself, pass it to Get-Member -InputObject:
Get-Member -InputObject ([regex]::Matches('foo', 'o'))

like image 73
mklement0 Avatar answered Dec 31 '22 09:12

mklement0