Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex returning complete line instead of match

I am trying to regex out a date from a text file. This is the content:

Storage Manager Command Line Administrative Interface - Version 7, Release 1, Level 1.4 (c) Copyright by Corporation and other(s) 1990, 2015. All Rights Reserved.

Session established with server TSERVER: Windows Server Version 7, Release 1, Level 5.200 Server date/time: 11/22/2016 15:30:00 Last access: 11/22/2016 15:25:00

ANS8000I Server command.

I need to extract the date/time after server date/time. I have written this regex:

/([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})/

This works perfectly in regex101. See example on https://regex101.com/r/MB7yB4/1 However, in Powershell it reacts different.

$var -match "([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})"

gives

Server date/time: 11/22/2016 16:30:00 Last access: 11/22/2016 15:37:19

and

$var -match "([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})"

gives nothing.

I am not sure why the match is not the same.
Thanks for any help!

like image 716
mitch2k Avatar asked Nov 23 '16 10:11

mitch2k


2 Answers

The -match operator returns a boolean value showing if a match was found or not. Also, it sets the $matches variable with the match data (the whole match and capture group values). You just need to access the whole match:

if($var -match '[0-9]{1,2}/[0-9]{1,2}/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}') { $matches[0] }

See Using -match and the $matches variable in PowerShell.

Note that there is no need escaping / synmbol in Powershell regexps, since this character is not special, and regex delimiters (those outer /.../ as in JS, PHP regexp) are not used when defining a regular expression in Powershell.

like image 196
Wiktor Stribiżew Avatar answered Oct 20 '22 10:10

Wiktor Stribiżew


To complement Wiktor Stribiżew's helpful answer, which contains many useful pointers and an effective solution, but doesn't explain the behavior of the -match operator with array input:

  • The behavior of the -match operator changes if the LHS is an array of strings: instead of a Boolean, the matching array elements are returned and the $Matches variable is not populated. Effectively, -match then performs array filtering.
    • You've probably read your file content into $var with just Get-Content, which returns the lines as a string array rather than a single string. In PSv3+, adding switch -Raw reads the entire file as a single string.
    • Your regex matched (only) the 5th element of the input array (the 5th line from the file), so that element - the whole line - was returned.
  • As explained in Wiktor's answer, you need to access the entries of the automatically created $Matches hashtable in order to access information about what the most recent use of -match captured: $Matches[0] contains what the regex captured as a whole, $Matches[1] what the first (unnamed) capture group captured ($Matches[2] for the 2nd one, ...), and $Matches['<name>'] for named capture groups, as demonstrated in LotPing's helpful answer. ($Matches.0 is just an alternative syntax for $Matches[0], for instance).
  • It's better to use single-quoted strings ('...') to define regular expressions, so that PowerShell's own string interpolation that is applied to double-quoted strings ("...") doesn't get in the way.

When it comes to substring extraction using a regular expression, using -replace often allows a more concise solution:

$var -join "`n" -replace '(?s).*?(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}).*', '$1'

The extra -join "`n" step is needed to reassemble the array of lines in $var into a single string to pass as input to -replace.
The explanation below shows how to use Get-Content -Raw to read the entire file as a single string to begin with.

Explanation:

# Read the text file as a *single* string, using -Raw.
# Note: Without -Raw, you get an *array* of strings representing 
#       the individual lines.
$var = Get-Content -Raw file.txt

# Define the regex that matches the *entire* input,
# with a single capture group capturing the substring of interest.
# The regex:
#   - is prefixed with an inline-option expression, (?s), which ensures
#     that . also matches a newline.
#   - starts with .*? a non-greedy expression matching any
#     sequence of characters at the start of the input,
#   - followed by the original capture-group regex (though without escaping of / as \/,
#     because that is not necessary in PowerShell, and \d used instead of [0-9])
#   - ends with .*, a greedy expression that matches everything through the
#     end of the input.
$re = '(?s).*?(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}).*'

# Using -replace, we replace the entire input string - by virtue
# of the overall regex matching the entire string - with only 
# what the capture group captured ($1).
# The net effect is that only the capture group value is output.
# With the sample input, this outputs '1/22/2016 15:30:00', the first
# timestamp encountered.
$var -replace $re, '$1'
like image 27
mklement0 Avatar answered Oct 20 '22 10:10

mklement0