I'm trying to write a script that would go through 1.6 million files in a folder and move them to the correct folder based on the file name.
The reason is that NTFS can't handle a large number of files within a single folder without a degrade in performance.
The script call "Get-ChildItem" to get all the items within that folder, and as you might expect, this consumes a lot of memory (about 3.8 GB).
I'm curious if there are any other ways to iterate through all the files in a directory without using up so much memory.
If you do
$files = Get-ChildItem $dirWithMillionsOfFiles
#Now, process with $files
you WILL face memory issues.
Use PowerShell piping to process the files:
Get-ChildItem $dirWithMillionsOfFiles | %{
#process here
}
The second way will consume less memory and should ideally not grow beyond a certain point.
If you need to reduce the memory footprint, you can skip using Get-ChildItem
and instead use a .NET API directly. I'm assuming you are on Powershell v2, if so first follow the steps here to enable .NET 4 to load in Powershell v2.
In .NET 4 there are some nice APIs for enumerating files and directories, as opposed to returning them in arrays.
[IO.Directory]::EnumerateFiles("C:\logs") |%{ <move file $_> }
By using this API, instead of [IO.Directory]::GetFiles()
, only one file name will be processed at a time, so the memory consumption should be relatively small.
Edit
I was also assuming you had tried a simple pipelined approach like Get-ChildItem |ForEach { process }
. If this is enough, I agree it's the way to go.
But I want to clear up a common misconception: In v2, Get-ChildItem
(or really, the FileSystem provider) does not truly stream. The implementation uses the APIs Directory.GetDirectories
and Directory.GetFiles
, which in your case will generate a 1.6M-element array before any processing can occur. Once this is done, then yes, the remainder of the pipeline is streaming. And yes, this initial low-level piece has relatively minimal impact, since it is simply a string array, not an array of rich FileInfo
objects. But it is incorrect to claim that O(1)
memory is used in this pattern.
Powershell v3, in contrast, is built on .NET 4, and thus takes advantage of the streaming APIs I mention above (Directory.EnumerateDirectories
and Directory.EnumerateFiles
). This is a nice change, and helps in scenarios just like yours.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With