I can count all the files in a folder and sub-folders, the folders themselves are not counted.
(gci -Path *Fill_in_path_here* -Recurse -File | where Name -like "*STB*").Count
However, powershell is too slow for the amount of files (up to 700k). I read that cmd is faster in executing this kind of task.
Unfortunately I have no knowledge of cmd code at all. In the example above I am counting all the files with STB
in the file name.
That is what I would like to do in cmd as well.
Any help is appreciated.
Theo's helpful answer based on direct use of .NET ([System.IO.Directory]::EnumerateFiles()
) is the fastest option (in my tests; YMMV - see the benchmark code below[1]).
Its limitations in the .NET Framework (FullCLR) - on which Windows PowerShell is built - are:
An exception is thrown when an inaccessible directory is encountered (due to lack of permissions). You can catch the exception, but you cannot continue the enumeration; that is, you cannot robustly enumerate all items that you can access while ignoring those that you cannot.
Hidden items are invariably included.
With recursive enumeration, symlinks / junctions to directories are invariably followed.
By contrast, the cross-platform .NET Core framework, since v2.1 - on which PowerShell Core is built - offers ways around these limitations, via the EnumerationOptions
options - see this answer for an example.
Note that you can also perform enumeration via the related [System.IO.DirectoryInfo]
type, which - similar to Get-ChildItem
- returns rich objects rather than mere path strings, allowing for much for versatile processing; e.g., to get an array of all file sizes (property .Length
, implicitly applied to each file object):
([System.IO.DirectoryInfo] $somePath).EnumerateFiles('*STB*', 'AllDirectories').Length
A native PowerShell solution that addresses these limitations and is still reasonably fast is to use Get-ChildItem
with the -Filter
parameter.
(Get-ChildItem -LiteralPath $somePath -Filter *STB* -Recurse -File).Count
Hidden items are excluded by default; add -Force
to include them.
To ignore permission problems, add -ErrorAction SilentlyContinue
or -ErrorAction Ignore
; the advantage of SilentlyContinue
is that you can later inspect the $Error
collection to determine the specific errors that occurred, so as to ensure that the errors truly only stem from permission problems.
$env:USERPROFILE\Cookies
.In Windows PowerShell, Get-ChildItem -Recurse
invariably follows symlinks / junctions to directories, unfortunately; more sensibly, PowerShell Core by default does not, and offers opt-in via -FollowSymlink
.
Like the [System.IO.DirectoryInfo]
-based solution, Get-ChildItem
outputs rich objects ([System.IO.FileInfo]
/ [System.IO.DirectoryInfo]
) describing each enumerated file-system item, allowing for versatile processing.
Note that while you can also pass wildcard arguments to -Path
(the implied first positional parameter) and -Include
(as in TobyU's answer), it is only -Filter
that provides
significant speed improvements, due to filtering at the source (the filesystem driver), so that PowerShell only receives the already-filtered results; by contrast, -Path
/ -Include
must first enumerate everything and match against the wildcard pattern afterwards.[2]
Caveats re -Filter
use:
*[0-9]
) and it has legacy quirks - see this answer.-Include
supports multiple (as an array).That said, -Filter
processes wildcards the same way as cmd.exe
's dir
.
Finally, for the sake of completeness, you can adapt MC ND's helpful answer based on cmd.exe
's dir
command for use in PowerShell, which simplifies matters:
(cmd /c dir /s /b /a-d "$somePath/*STB*").Count
PowerShell captures an external program's stdout output as an array of lines, whose element count you can simply query with the .Count
(or .Length
) property.
That said, this may or may not be faster than PowerShell's own Get-ChildItem -Filter
, depending on the filtering scenario; also note that dir /s
can only ever return path strings, whereas Get-ChildItem
returns rich objects whose properties you can query.
Caveats re dir
use:
/a-d
excludes directories, i.e., only reports files, but then also includes hidden files, which dir
doesn't do by default.
dir /s
invariably descends into hidden directories too during the recursive enumeration; an /a
(attribute-based) filter is only applied to the leaf items of the enumeration (only to files in this case).
dir /s
invariably follows symlinks / junctions to other directories (assuming it has the requisite permissions - see next point).
dir /s
quietly ignores directories or symlinks / junctions to directories if it cannot enumerate their contents due to lack of permissions - while this is helpful in the specific case of the aforementioned hidden system junctions (you can find them all with cmd /c dir C:\ /s /ashl
), it can cause you to miss the content of directories that you do want to enumerate, but can't for true lack of permissions, because dir /s
will give no indication that such content may even exist (if you directly target an inaccessible directory, you get a somewhat misleading File Not Found
error message, and the exit code is set to 1
).
Performance comparison:
The following tests compare pure enumeration performance without filtering, for simplicity, using a sizable directory tree assumed to be present on all systems, c:\windows\winsxs
; that said, it's easy to adapt the tests to also compare filtering performance.
The tests are run from PowerShell, which means that some overhead is introduced by creating a child process for cmd.exe
in order to invoke dir /s
, though (a) that overhead should be relatively low and (b) the larger point is that staying in the realm of PowerShell is well worthwhile, given its vastly superior capabilities compared to cmd.exe
.
The tests use function Time-Command
, which can be downloaded from this Gist, which averages 10 runs by default.
# Warm up the filesystem cache for the target dir.,
# both from PowerShell and cmd.exe, to be safe.
gci 'c:\windows\winsxs' -rec >$null; cmd /c dir /s 'c:\windows\winsxs' >$null
Time-Command `
{ @([System.IO.Directory]::EnumerateFiles('c:\windows\winsxs', '*', 'AllDirectories')).Count },
{ (Get-ChildItem -Force -Recurse -File 'c:\windows\winsxs').Count },
{ (cmd /c dir /s /b /a-d 'c:\windows\winsxs').Count },
{ cmd /c 'dir /s /b /a-d c:\windows\winsxs | find /c /v """"' }
On my single-core VMWare Fusion VM with Windows PowerShell v5.1.17134.407 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.523) I get the following timings, from fastest to slowest (scroll to the right to see the Factor
column to show relative performance):
Command Secs (10-run avg.) TimeSpan Factor
------- ------------------ -------- ------
@([System.IO.Directory]::EnumerateFiles('c:\windows\winsxs', '*', 'AllDirectories')).Count 11.016 00:00:11.0158660 1.00
(cmd /c dir /s /b /a-d 'c:\windows\winsxs').Count 15.128 00:00:15.1277635 1.37
cmd /c 'dir /s /b /a-d c:\windows\winsxs | find /c /v """"' 16.334 00:00:16.3343607 1.48
(Get-ChildItem -Force -Recurse -File 'c:\windows\winsxs').Count 24.525 00:00:24.5254979 2.23
Interestingly, both [System.IO.Directory]::EnumerateFiles()
and the Get-ChildItem
solution are significantly faster in PowerShell Core, which runs on top of .NET Core (as of PowerShell Core 6.2.0-preview.4, .NET Core 2.1):
Command Secs (10-run avg.) TimeSpan Factor
------- ------------------ -------- ------
@([System.IO.Directory]::EnumerateFiles('c:\windows\winsxs', '*', 'AllDirectories')).Count 5.094 00:00:05.0940364 1.00
(cmd /c dir /s /b /a-d 'c:\windows\winsxs').Count 12.961 00:00:12.9613440 2.54
cmd /c 'dir /s /b /a-d c:\windows\winsxs | find /c /v """"' 14.999 00:00:14.9992965 2.94
(Get-ChildItem -Force -Recurse -File 'c:\windows\winsxs').Count 16.736 00:00:16.7357536 3.29
[1] [System.IO.Directory]::EnumerateFiles()
is inherently and undoubtedly faster than a Get-ChildItem
solution. In my tests (see section "Performance comparison:" above), [System.IO.Directory]::EnumerateFiles()
beat out cmd /c dir /s
as well, slightly in Windows PowerShell, and clearly so in PowerShell Core, but others report different findings. That said, finding the overall fastest solution is not the only consideration, especially if more than just counting files is needed and if the enumeration needs to be robust. This answer discusses the tradeoffs of the various solutions.
[2] In fact, due to an inefficient implementation as of Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4, use of -Path
and -Include
is actually slower than using Get-ChildItem
unfiltered and instead using an additional pipeline segment with ... | Where-Object Name -like *STB*
, as in the OP - see this GitHub issue.
One of the fastest ways to do it in cmd
command line or batch file could be
dir "x:\some\where\*stb*" /s /b /a-d | find /c /v ""
Just a recursive (/s
) dir
command to list all files (no folders /a-d
) in bare format (/b
), with all the output piped to find
command that will count (/c
) the number of non empty lines (/v ""
)
But, in any case, you will need to enumerate the files and it requires time.
edited to adapt to comments, BUT
note The approach below does not work for this case because, at least in windows 10, the space padding in the summary lines of the dir
command is set to five positions. File counts greater than 99999 are not correctly padded, so the sort /r
output is not correct.
As pointed by Ben Personick, the dir
command also outputs the number of files and we can retrieve this information:
@echo off
setlocal enableextensions disabledelayedexpansion
rem Configure where and what to search
set "search=x:\some\where\*stb*"
rem Retrieve the number of files
set "numFiles=0"
for /f %%a in ('
dir "%search%" /s /a-d /w 2^>nul %= get the list of the files =%
^| findstr /r /c:"^ *[1-9]" %= retrieve only summary lines =%
^| sort /r 2^>nul %= reverse sort, greater line first =%
^| cmd /e /v /c"set /p .=&&echo(!.!" %= retrieve only first line =%
') do set "numFiles=%%a"
echo File(s) found: %numFiles%
The basic idea is use a serie of piped commands to handle different parts of data retrieval:
dir
command to generate the list of files (/w
is included just to generate less lines). findstr
is used to retrieve only that lines starting with spaces (the header/summary lines) and a number greater than 0 (file count summary lines, as we are using /a-d
the directory count summary lines will have a value of 0).set /p
command in a separate cmd
instance. As the full sequence is wrapped in a for /f
and it has a performance problem when retrieving long lists from command execution, we will try to retrieve as little as possible.The for /f
will tokenize the retrieved line, get the first token (number of files) and set the variable used to hold the data (variable has been initialized, it is possible that no file could be found).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With