Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell, File system provider, Get-ChildItem filtering... where are the official docs?

Tags:

powershell

As mentioned in another question, if you try to do a Get-ChildItem -filter ... command you are more limited than if you used -include instead of -filter. I'd like to read the official docs for the file system provider's filtering syntax but after a half hour of searching I still haven't found them. Anyone know where to look?

like image 879
Vimes Avatar asked Jul 08 '11 18:07

Vimes


3 Answers

tl;dr -Filter uses .NET's implementation of FsRtllsNameInExpression, which is documented on MSDN along with basic pattern matching info. The algorithm is unintuitive for compatibility reasons, and you should probably avoid using this feature. Additionally, .NET has numerous bugs in its implementation.

-Filter does not use the filtering system provided by PowerShell--that is, it does not use the filtering system described by Get-Help about_Wildcard. Rather, it passes the filter to the Windows API. Therefore, the filtering works the same as it does in any other program that utilizes the Windows API, such as cmd.exe.

Instead, PowerShell uses a FsRtlIsNameInExpression-like algorithm for -Filter pattern matching. The algorithm based on old MS-DOS behavior, so it's riddled with caveats that are preserved for legacy purposes. It's typically said to have three common special characters. The exact behavior is complex, but it's more or less like the following:

  • *: Matches any number of characters (zero-inclusive)
  • ?: Matches exactly one character, excluding the last period in a name
  • .: If the last period in a pattern, anchors to the last period in the filename, or the end of the filename if it doesn't have a period; can also match a literal period

Just to make things more complicated, Windows added three additional special characters that behave exactly the same as the old MS-DOS special characters. The original special characters have slightly different behavior now to account for more flexible filesystems.

  • " is equivalent to MS-DOS . (DOS_DOT and ANSI_DOS_DOT in ntifs.h)
  • < is equivalent to MS-DOS ? (DOS_QM and ANSI_DOS_QM in ntifs.h)
  • > is equivalent to MS-DOS * (DOS_STAR and ANSI_DOS_STAR in ntifs.h)

Quite a few sources seem to reverse < and >. Frighteningly, Microsoft confuses them in their .NET implementation, which means they are also reversed in PowerShell. Additionally, all three compatibility wildcards are inaccissible from -Filter, as System.IO.Path mistakenly treats "<> as invalid, non-wildcard characters. (It allows .*?.) This contributes to the notion that -Filter is incomplete, unstable, and buggy. You can see .NET's (buggy) implementation of the algorithm on GitHub.

This is additionally complicated by the algorithm's support for 8.3 compatibility filenames, otherwise known as "short" filenames. (You've probably seen them before; they look something like: SOMETH~1.TXT) A file matches the pattern if either its full filename or its short filename match. FrankFranchise has more information about this caveat in his answer.

The previously-linked MSDN article on FsRtlIsNameInExpression has the most up-to-date documentation on Windows filename pattern matching, but it's not particularly verbose. For a more thorough explanation of how matching used to work on MS-DOS and how this affects modern matching, this MSDN blog article is the best source I've found. Here's the basic idea:

  • Every filename was exactly 11 bytes.
    • The first 8 bytes stored the body of the filename, right-padded with spaces
    • The last 3 bytes stored the extension, right-padded with spaces
  • Letters were converted to uppercase
  • Letters, numbers, spaces, and some symbols matched only themselves
  • ? matched any single character, except spaces in the extension
  • . would fill the remainder of the first 8 bytes with spaces, then advance to the 9th byte (the start of the extension)
  • * would fill the remainder of the current section (body or extension) with question marks, then advance to the next section (or the end of the pattern)

The transformations would look like this:

                          11
User             12345678901
------------     -----------
ABC.TXT       >  ABC     TXT
WILDCARD.TXT  >  WILDCARDTXT
ABC.???       >  ABC     ???
*.*           >  ???????????
*.            >  ????????   
ABC.          >  ABC        

Extrapolating this to work with modern-day filesystems is an unintuitive process at best. For example, take a directory such as the following:

Name                 Compat Name
-----------------------------------------------
Apple1.txt           APPLE1  .TXT
Banana               BANANA  .
Something.txt        SOMETH~1.TXT
SomethingElse.txt    SOMETH~2.TXT
TXT.exe              TXT     .EXE
TXT.eexe             TXT~1   .EEX
Wildcard.txt         WILDCARD.TXT

I've done quite a bit of testing of these wildcards on Windows 10 and have gotten very inconsistent results, especially DOS_DOT ("). If you test these from on your own from the command prompt, you'll likely need to escape them (e.g., dir ^>^"^> in cmd.exe to emulate MS-DOS *.*).

*.*                 (everything)
<"<                 (everything)
*                   (everything)
<                   Banana
.                   (everything)
"                   (everything)
*.                  Banana
<"                  Banana
*g.txt              Something.txt
<g.txt              Something.txt
<g"txt              (nothing)
*1.txt              Apple1.txt, Something.txt
<1.txt              Apple1.txt, Something.txt
<1"txt              (nothing)
*xe                 TXT.eexe, TXT.exe
<xe                 (nothing)
*exe                TXT.eexe, TXT.exe
<exe                TXT.exe
??????.???          Apple1.txt, Asdf.tx, Banana, TXT.eexe, TXT.exe
>>>>>>.>>>          Apple1.txt, Asdf.tx, TXT.eexe, TXT.exe
>>>>>>">>>          Banana
????????.???        (everything)
>>>>>>>>.>>>        (everything except Banana)
>>>>>>>>">>>        Banana
???????????.???     (everything)
>>>>>>>>>>>.>>>     (everything except Banana)
>>>>>>>>>>>">>>     Banana
??????              Banana
>>>>>>              Banana
???????????         Banana
>>>>>>>>>>>         Banana
????????????        Banana
????                (nothing)
>>>>                (nothing)
Banana??.           Banana
Banana>>.           Banana
Banana>>"           Banana
Banana????.         Banana
Banana>>>>.         Banana
Banana>>>>"         Banana
Banana.             Banana
Banana"             Banana
*txt                Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
<txt                Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
*t                  Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
<t                  (nothing)
*txt*               Apple1.txt, Something.txt, SomethingElse.txt, TXT.eexe, TXT.exe, Wildcard.txt
<txt<               Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
*txt<               Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
<txt*               Apple1.txt, Something.txt, SomethingElse.txt, TXT.eexe, TXT.exe, Wildcard.txt

Note: As of writing, WINE's matching algorithm yields significantly different results when testing these "gotchas". Tested with WINE 1.9.6.

As you can see, the backwards-compatible MS-DOS wildcards are obscure and buggy. Even Microsoft has implemented them incorrectly at least once, and it's unclear whether their current behavior in Windows is intentional. The behavior of " seems completely random, and I expected the results of the last two tests to be swapped.

like image 184
Zenexer Avatar answered Dec 16 '22 13:12

Zenexer


There is almost nothing on -filter.

There is a little bit when you do Get-Help Get-ChildItem -full, but I'm sure you've seen it. There is a post on the Powershell blog, as well. Neither give examples.

Best example I could find is this one, which simply demonstrates that the filter is a string that the provider uses to return a subset of what it would otherwise return, and it's not even directly demonstrating -filter but simply uses it. However, it's a bit better glimpse than the other links.

However, because the provider is doing the filtering before the results get back to the cmdlet, there are certain caveats. For example, if I want to recursively find all files and directories that begin with "test", I would not want to start with this:

Get-ChildItem -filter 'test*' -recurse

This would filter all results in the current directory before returning anything for the recursion. If I had a directory that began with "test", it would recurse that directory (since the provider would return it to the cmdlet), but no others.

As the example shows, it can address properties in some providers. In the FileSystem provider, you may only be able to use wildcard matching strings on the directory's or file's name (leaf, not full-qualified).

like image 35
Joel B Fant Avatar answered Dec 16 '22 12:12

Joel B Fant


To follow up on what Zenexer mentioned, you should see about the same results that you would see using the same filters with cmd.exe. This includes things you might not expect like 8.3 short file names. You can test this yourself.

Create some example files with PowerShell
md filtertest | cd
(1..1000) | % { New-item -Name ("aaaaa{0:D6}.txt" -f $_) -ItemType File }
Now open up a cmd prompt and run
dir /x
dir aaab*

The first command shows the 8.3 short-names. The second matches some files, even though there is no 'b' character in any of the normal names, because those files contain a 'b' in the short-name.

Now you can flip back to PowerShell and run ls -Filter aaab* to see the same files again. The -Filter string is passed to the WinAPI, which matches against those files with 'b' in the 8.3 short-names, just like dir in cmd.exe. So beware unexpted results when using -Filter, you might be matching against the 8.3 short-name.

This is all assuming that 8.3 short-names are enabled on your computer.

like image 45
PatrickFranchise Avatar answered Dec 16 '22 13:12

PatrickFranchise