Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing duplicate files with Powershell

Tags:

powershell

I have several thousand duplicate files (jar files as an example) that I'd like to use powershell to

  1. Search through the file system recursively
  2. Find the dups (either by name only or a checksum method or both)
  3. Delete all duplicates but one.

I'm new to powershell and am throwing this out there to the PS folks that might be able to help.

like image 574
notec Avatar asked May 30 '13 20:05

notec


People also ask

How do I Remove duplicates from a CSV file in PowerShell?

This can be achieve by using the Sort-Object and the Import-CSV cmdlet to remove duplicates from a CSV file. After the contents of the CSV file sorted using Sort-Object, you can use the unique switch to return only unique rows from the file.

Can duplicate files be deleted?

In Windows, you can delete duplicate files in two ways: manually or using duplicate file removal software.


2 Answers

try this:

ls *.txt -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } | % { $_.group | select -skip 1 } | del

from: http://n3wjack.net/2015/04/06/find-and-delete-duplicate-files-with-just-powershell/

like image 126
Kai Wang Avatar answered Oct 24 '22 14:10

Kai Wang


Keep a dictionary of files, delete when the next file name was already encountered before:

$dict = @{};
dir c:\admin -Recurse | foreach {
  $key = $_.Name #replace this with your checksum function
  $find = $dict[$key];
  if($find -ne $null) {
    #current file is a duplicate
    #Remove-Item -Path $_.FullName ?    
  }
  $dict[$key] = 0; #dummy placeholder to save memory
}

I used file name as a key, but you can use a checksum if you want (or both) - see code comment.

like image 26
Neolisk Avatar answered Oct 24 '22 15:10

Neolisk