Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The most efficient way to delete millions of files based on modified date, in windows

Goal: Use a script to run through 5 million - 10 million XML files and evaluate their date, if older than 90 days delete the file. The script would be run daily.

Problem: Using powershell Get-ChildItem -recurse, causes the script to lock up and fail to delete any files, I assume this is because of the way Get-ChildItem needs to build the whole array before taking any action on any file.

Solution ?: After lots of research I found that [System.IO.Directory]::EnumerateFiles will be able to take action on items in the array before the array is completely built so that should make things more efficient (https://msdn.microsoft.com/library/dd383458%28v=vs.100%29.aspx). After more testing I found that foreach ($1 in $2) is more efficient than $1 | % {} Before I run this new code and potentially crash this server again is there any adjustment anyone can suggest for a more efficient way to script this?

For testing I just created 15,000 x 0.02KB txt files in 15,000 directories with random data in them and ran the below code, I used 90 seconds instead of 90 days on the $date variable just for the test, it took 6 seconds to delete all the txt files.

$getfiles = [System.IO.Directory]::EnumerateFiles("C:\temp", "*.txt", "AllDirectories")
$date = ([System.DateTime]::Now).AddSeconds(-90)
foreach ($2 in $getfiles) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach
like image 819
Will Avatar asked Dec 14 '22 08:12

Will


1 Answers

Powershell one-liner that does 100,000 files >= 90 days old.

[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { rm $_ }

or with progress shown:

[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { $c = 0 } { Write-Progress
-Activity "Delete Files" -CurrentOperation $_ -PercentComplete 
((++$c/100000)*100); rm $_ }

This works on folders that have a very large number of files. Thanks to my co-worker Doug!

like image 154
Robot70 Avatar answered Dec 29 '22 10:12

Robot70