Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't this FINDSTR example with multiple literal search strings find a match?

The following FINDSTR example fails to find a match.

echo ffffaaa|findstr /l "ffffaaa faffaffddd"

Why?

like image 524
dbenham Avatar asked Jan 19 '12 05:01

dbenham


People also ask

How do I find multiple strings in findstr?

When the search string contains multiple words, separated with spaces, then findstr will return lines that contain either word (OR). A literal search ( /C:string ) will reverse this behaviour and allow searching for a phrase or sentence. A literal search also allow searching for punctuation characters.

What does findstr command do?

The findstr (short for find string) command is used in MS-DOS to locate files containing a specific string of plain text.

What is the difference between find and findstr?

The command sends the specified lines to the standard output device. It is similar to the find command. However, while the find command supports UTF-16, findstr does not. On the other hand, findstr supports regular expressions, which find does not.

What is findstr in powershell?

findstr /s /i Windows *.* To find all occurrences of lines that begin with FOR and are preceded by zero or more spaces (as in a computer program loop), and to display the line number where each occurrence is found, type: Copy.


2 Answers

Apparantly this is a long standing FINDSTR bug. I think it can be a crippling bug, depending on the circumstances.

I have confirmed the command fails on two different Vista machines, a Windows 7 machine, and an XP machine. I found this findstr - broken ??? link that reports a similar search fails on Windows Server 2003, but it succeeds on Windows 2000.

I've done a number of experiments and it seems all of the following conditions must be met for the potential of a failure:

  • The search is using multiple literal search strings
  • The search strings are of different lengths
  • A short search string has some amount of overlap with a longer search string
  • The search is case sensitive (no /I option)

In every failure I have seen, it is always one of the shorter search strings that fails.

It does not matter how the search strings are specified. The same faulty result is achieved using multiple /C:"search" options and also with the /G:file option.

The only 3 workarounds I have been able to come up with are:

  • Use the /I option if you don't care about case. Obviously this might not meet your needs.

  • Use the /R regular expression option. But if you do then you have to make sure you escape any meta-characters in the search so that it matches the result expected of a literal search. This can be problematic as well.

  • If you are using the /V option, then use multiple piped FINDSTR commands with one search string each instead of one FINDSTR with multiple searches. This also can be a problem if you have a lot of search strings for which you want to use the /G:file option.

I hate this bug!!!!

Note - See What are the undocumented features and limitations of the Windows FINDSTR command? for a comprehensive list of FINDSTR idiosyncrasies.

like image 166
dbenham Avatar answered Sep 28 '22 07:09

dbenham


I cannot tell why findstr may fail with multiple literal strings. However, I can provide a method to work around that annoying bug.

Given that the literal search strings are listed in a text file called search_strings.txt...:

ffffaaa
faffaffddd

..., you can convert it to regular expressions by inserting a backslash in front of every single character:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
> "regular_expressions.txt" (
    for /F usebackq^ delims^=^ eol^= %%S in ("search_strings.txt") do (
        set "REGEX=" & set "STRING=%%S"
        for /F delims^=^ eol^= %%T in ('
            cmd /U /V /C echo(!STRING!^| find /V ""
        ') do (
            set "ESCCHR=\%%T"
            if "%%T"="<" (set "ESCCHR=%%T") else if "%%T"=">" (set "ESCCHR=%%T")
            setlocal EnableDelayedExpansion
            for /F "delims=" %%U in ("REGEX=!REGEX!!ESCCHR!") do (
                endlocal & set "%%U"
            )
        )
        setlocal EnableDelayedExpansion
        echo(!REGEX!
        endlocal
    )
)
endlocal

Then use the converted file regular_expressions.txt...:

\f\f\f\f\a\a\a
\f\a\f\f\a\f\f\d\d\d

...to do a regular expression search, which seems to work fine also with multiple search strings:

echo ffffaaa| findstr /R /G:"regular_expressions.txt"

The preceding backslashes simply escape every character including those that have a particular meaning in regular expression searches.

The characters < and > are excluded from being escaped in order to avoid conflicts with word boundaries, which were expressed by \< and \> when appearing at the beginning and at the end of a search string, respectively.

Since regular expressions are limited to 254 characters for findstr versions past Windows XP (opposed to literal strings, which are limited to 511 characters), the length of the original search strings is limited to 127 characters, because every such character is expressed by two characters due to the escaping.


Here is an alternative approach that only escapes the meta-characters ., *, ^, $, [, ], \, ":

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "_META=.*^$[]\"^" & rem (including `"`)
> "regular_expressions.txt" (
    for /F usebackq^ delims^=^ eol^= %%S in ("search_strings.txt") do (
        set "REGEX=" & set "STRING=%%S"
        for /F delims^=^ eol^= %%T in ('
            cmd /U /V /C echo(!STRING!^| find /V ""
        ') do (
            set "CHR=%%T"
            setlocal EnableDelayedExpansion
            if not "!_META!"=="!_META:*%%T=!" set "CHR=\!CHR!"
            for /F "delims=" %%U in ("REGEX=!REGEX!!CHR!") do (
                endlocal & set "%%U"
            )
        )
        setlocal EnableDelayedExpansion
        echo(!REGEX!
        endlocal
    )
)
endlocal

The advantage of this method is that the length of the search strings is no longer limited to 127 characters but to 254 characters minus 1 for every occurring aforementioned meta-character, applying for findstr versions past Windows XP.


Here is another work-around, using a case-insensitive search with findstr at the first place, then post-filtering the result by case-sensitive comparisons:

echo ffffaaa|findstr /L /I "ffffaaa faffaffddd"|cmd /V /C set /P STR=""^&if @^^!STR^^!==@^^!STR:ffffaaa=ffffaaa^^! (echo(^^!STR^^!) else if @^^!STR^^!==@^^!STR:faffaffddd=faffaffddd^^! (echo(^^!STR^^!)

The double-escaped exclamation marks ensure the variable STR is expanded in the explicitly invoked cmd instance even in case delayed expansion is enabled in the hosting cmd instance.


By the way, due to what I call a design flaw, searches with literal strings using findstr never work reliably as soon as they contain backslashes, because such may still be consumed to escape following meta-characters, although not necessary; for example, the search string \. actually matches .; to truly match \. literally, you must specify the search string \\.. I do not understand why meta-characters are still recognised when doing literal searches, that is not what I call literal.

like image 25
aschipfl Avatar answered Sep 28 '22 08:09

aschipfl