I have a ~20000 jpg images, some of which are duplicates. Unfortunately, some files have been been tagged with EXIF metadata, so a simple file hash cannot identify the duplicated one.
I am attempting to create a Powershell script to process these, but can find no way to extract only the bitmap data.
The system.drawing.bitmap can only return a bitmap object, not bytes. There's a GetHash() function, but it apparently acts on the whole file.
How can I hash these files in a way that the EXIF information is excluded? I'd prefer to avoid external dependencies if possible.
This is a PowerShell V2.0 advanced function implemention. It is a bit long but I have verified it gives the same hashcode (generated from the bitmap pixels) on the same picture but with different metadata and file sizes. This is a pipeline capable version that also accepts wildcards and literal paths:
function Get-BitmapHashCode
{
[CmdletBinding(DefaultParameterSetName="Path")]
param(
[Parameter(Mandatory=$true,
Position=0,
ParameterSetName="Path",
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true,
HelpMessage="Path to bitmap file")]
[ValidateNotNullOrEmpty()]
[string[]]
$Path,
[Alias("PSPath")]
[Parameter(Mandatory=$true,
Position=0,
ParameterSetName="LiteralPath",
ValueFromPipelineByPropertyName=$true,
HelpMessage="Path to bitmap file")]
[ValidateNotNullOrEmpty()]
[string[]]
$LiteralPath
)
Begin {
Add-Type -AssemblyName System.Drawing
$sha = new-object System.Security.Cryptography.SHA256Managed
}
Process {
if ($psCmdlet.ParameterSetName -eq "Path")
{
# In -Path case we may need to resolve a wildcarded path
$resolvedPaths = @($Path | Resolve-Path | Convert-Path)
}
else
{
# Must be -LiteralPath
$resolvedPaths = @($LiteralPath | Convert-Path)
}
# Find PInvoke info for each specified path
foreach ($rpath in $resolvedPaths)
{
Write-Verbose "Processing $rpath"
try {
$bmp = new-object System.Drawing.Bitmap $rpath
$stream = new-object System.IO.MemoryStream
$writer = new-object System.IO.BinaryWriter $stream
for ($w = 0; $w -lt $bmp.Width; $w++) {
for ($h = 0; $h -lt $bmp.Height; $h++) {
$pixel = $bmp.GetPixel($w,$h)
$writer.Write($pixel.ToArgb())
}
}
$writer.Flush()
[void]$stream.Seek(0,'Begin')
$hash = $sha.ComputeHash($stream)
[BitConverter]::ToString($hash) -replace '-',''
}
finally {
if ($bmp) { $bmp.Dispose() }
if ($writer) { $writer.Close() }
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With