Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use subpatterns in FINDSTR

I must check the validity of a string stored in a variable, I can not use external CLI utilities (grep, awk, etc.) so I chose FINDSTR. The string has this format (in regexp):

([1-9][0-9]*:".*"(|".*")*)

I do not know how to check the subpattern (|. "*"). Currently my code is:

((ECHO.) | (SET /P "=(11:"a"|"b"|"c")") | (FINDSTR /R /C:"^([1-9][0-9]*:".*")$"))

Regards.

like image 321
networkcode Avatar asked Sep 23 '12 18:09

networkcode


1 Answers

Mat M is correct about the limitation of FINDSTR. The FINDSTR regex support is very primitive and non-standard. Type HELP FINDSTR or FINDSTR /? from the command line to get a brief synopsis of what is supported. For an in depth explanation, refer to What are the undocumented features and limitations of the Windows FINDSTR command?

I like Harry Johnston's comment - It would be quite easy to create a solution using VBScript or JavaScript. I think that would be a much better choice.

But, here is a native batch solution. I've incorporated the extra rule about the number of subpatterns that the OP stated in the comment to Mat M's answer.

The solution is surprisingly tricky. Special characters can cause problems when piping the ECHO output to FINDSTR because of the way pipes work. Each side of the pipe is executed in it's own CMD session. The special characters must either be quoted, escaped twice, or only exposed via delayed expansion. I chose to use delayed expansion, but the ! characters must be escaped twice to make sure the delayed expansion occurs at the correct time.

The easiest way to parse a variable number of subpatterns is to replace the delimiter with a newline and use FOR /F to iterate each subpattern.

The top half of my code is a brittle coding harness to conveniently iterate and test a set of strings. It will not work properly with any of <space> ; , = <tab> * or ? in the string. Also, the quotes must be balanced in each string.

But the more important validate routine can handle any string in the var variable.

@echo off
setlocal
set LF=^


::Above 2 blank lines are critical for creating a linefeed variable. Do not remove

set test=a

for %%S in (
  "(3:"a"|"c"|"c")"
  "(11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
  "(4:"a"|"b"|"c")"
  "(10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
  "(3:"a"|"b"|"c""
  "(3:"a"|"b^|c")"
  "(3:"a"|"b"|c)"
  "(3:"a"|"b"||"c")"
  "(3:"a"|"b"|;|"c")"
) do (
  set "var=%%~S"
  call :validate
)
exit /b

:validate
setlocal enableDelayedExpansion
cmd /v:on /c echo ^^^!var^^^!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid  FINDSTR fail& exit /b)
if "!var:||=!" neq "!var!" (call :invalid double pipe fail& exit /b)
for /f "delims=(:" %%N in ("!var!") do set "expectedCount=%%N"
set "str=!var:*:=!"
set "str=!str:~0,-1!"
set foundCount=0
for %%A in ("!LF!") do for /f eol^=^%LF%%LF%^ delims^=  %%B in ("!str:|=%%~A!") do (
  if %%B neq "%%~B" (call :invalid sub-pattern fail& exit /b)
  set /a foundCount+=1
)
if %foundCount% neq %expectedCount% (call :invalid count fail& exit /b)
echo Valid: !var!
exit /b
:invalid
echo Invalid - %*: !var!
exit /b

Here are the results after running the batch script

Valid: (3:"a"|"c"|"c")
Valid: (11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - count fail: (4:"a"|"b"|"c")
Invalid - count fail: (10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - FINDSTR fail: (3:"a"|"b"|"c"
Invalid - sub-pattern fail: (3:"a"|"b|c")
Invalid - sub-pattern fail: (3:"a"|"b"|c)
Invalid - double pipe fail: (3:"a"|"b"||"c")
Invalid - sub-pattern fail: (3:"a"|"b"|;|"c")


Update

The :validate routine can be simplified a bit by postponing the enablement of delayed expansion until after the CMD /V:ON pipe. This means I no longer have to worry about double escaping the ! on the left side of the pipe.

:validate
cmd /v:on /c echo !var!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid  FINDSTR fail& exit /b)
setlocal enableDelayedExpansion
... remainder unchanged
like image 155
dbenham Avatar answered Oct 02 '22 20:10

dbenham