Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell binary grep

Is there a way to determine whether a specified file contains a specified byte array (at any position) in powershell?

Something like:

fgrep --binary-files=binary "$data" "$filepath"

Of course, I can write a naive implementation:

function posOfArrayWithinArray {
    param ([byte[]] $arrayA, [byte[]]$arrayB)
    if ($arrayB.Length -ge $arrayA.Length) {
        foreach ($pos in 0..($arrayB.Length - $arrayA.Length)) {
            if ([System.Linq.Enumerable]::SequenceEqual(
                $arrayA,
                [System.Linq.Enumerable]::Skip($arrayB, $pos).Take($arrayA.Length)
            )) {return $pos}
        }
    }
    -1
}

function posOfArrayWithinFile {
    param ([byte[]] $array, [string]$filepath)
    posOfArrayWithinArray $array (Get-Content $filepath -Raw -AsByteStream)
}

// They return position or -1, but simple $false/$true are also enough for me.

— but it's extremely slow.

like image 481
Sasha Avatar asked Jun 16 '20 03:06

Sasha


2 Answers

Sorry, for the additional answer. It is not usual to do so, but the universal question intrigues me and the approach and information of my initial "using -Like" answer is completely different. Btw, if you looking for a positive response to the question "I believe that it must exist in .NET" to accept an answer, it probably not going to happen, the same quest exists for StackOverflow searches in combination with C#, .Net or Linq.
Anyways, the fact that nobody is able to find the single assumed .Net command for this so far, it is quiet understandable that several semi-.Net solutions are being purposed instead but I believe that this will cause some undesired overhead for a universal function.
Assuming that you ByteArray (the byte array being searched) and SearchArray (the byte array to be searched) are completely random. There is only a 1/256 chance that each byte in the ByteArray will match the first byte of the SearchArray. In that case you don't have to look further, and if it does match, the chance that the second byte also matches is 1/2562, etc. Meaning that the inner loop will only run about 1.004 times as much as the outer loop. In other words, the performance of everything outside the inner loop (but in the outer loop) is almost as important as what is in the inner loop!
Note that this also implies that the chance a 500Kb random sequence exists in a 100Mb random sequence is virtually zero. (So, how random are your given binary sequences actually?, If they are far from random, I think you need to add some more details to your question). A worse case scenario for my assumption will be a ByteArray existing of the same bytes (e.g. 0, 0, 0, ..., 0, 0, 0) and a SearchArray of the same bytes ending with a different byte (e.g. 0, 0, 0, ..., 0, 0, 1).

Based on this, it shows again (I have also proven this in some other answers) that native PowerShell commands aren't that bad and possibly could even outperform .Net/Linq commands in some cases. In my testing, the below Find-Bytes function is about 20% till twice as fast as the function in your question:

Find-Bytes

Returns the index of where the -Search byte sequence is found in the -Bytes byte sequence. If the search sequence is not found a $Null ([System.Management.Automation.Internal.AutomationNull]::Value) is returned.

Parameters

-Bytes
The byte array to be searched

-Search
The byte array to search for

-Start
Defines where to start searching in the Bytes sequence (default: 0)

-All
By default, only the first index found will be returned. Use the -All switch to return the remaining indexes of any other search sequences found.

Function Find-Bytes([byte[]]$Bytes, [byte[]]$Search, [int]$Start, [Switch]$All) {
    For ($Index = $Start; $Index -le $Bytes.Length - $Search.Length ; $Index++) {
        For ($i = 0; $i -lt $Search.Length -and $Bytes[$Index + $i] -eq $Search[$i]; $i++) {}
        If ($i -ge $Search.Length) { 
            $Index
            If (!$All) { Return }
        } 
    }
}

Usage example:

$a = [byte[]]("the quick brown fox jumps over the lazy dog".ToCharArray())
$b = [byte[]]("the".ToCharArray())

Find-Bytes -all $a $b
0
31

Benchmark
Note that you should open a new PowerShell session to properly benchmark this as Linq uses a large cache that properly doesn't apply to your use case.

$a = [byte[]](&{ foreach ($i in (0..500Kb)) { Get-Random -Maximum 256 } })
$b = [byte[]](&{ foreach ($i in (0..500))   { Get-Random -Maximum 256 } })

Measure-Command {
    $y = Find-Bytes $a $b
}

Measure-Command {
    $x = posOfArrayWithinArray $b $a
}
like image 134
iRon Avatar answered Oct 05 '22 15:10

iRon


The below code may prove to be faster, but you will have to test that out on your binary files:

function Get-BinaryText {
    # converts the bytes of a file to a string that has a
    # 1-to-1 mapping back to the file's original bytes. 
    # Useful for performing binary regular expressions.
    Param (
        [Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
        [ValidateScript( { Test-Path $_ -PathType Leaf } )]
        [Alias('FullName','FilePath')]
        [string]$Path
    )

    $Stream = New-Object System.IO.FileStream -ArgumentList $Path, 'Open', 'Read'

    # Note: Codepage 28591 returns a 1-to-1 char to byte mapping
    $Encoding     = [Text.Encoding]::GetEncoding(28591)
    $StreamReader = New-Object System.IO.StreamReader -ArgumentList $Stream, $Encoding
    $BinaryText   = $StreamReader.ReadToEnd()

    $Stream.Dispose()
    $StreamReader.Dispose()

    return $BinaryText
}

# enter the byte array to search for here
# for demo, I'll use 'SearchMe' in bytes
[byte[]]$searchArray = 83,101,97,114,99,104,77,101

# create a regex from the $searchArray bytes
# 'SearchMe' --> '\x53\x65\x61\x72\x63\x68\x4D\x65'
$searchString = ($searchArray | ForEach-Object { '\x{0:X2}' -f $_ }) -join ''
$regex = [regex]$searchString

# read the file as binary string
$binString = Get-BinaryText -Path 'D:\test.bin'

# use regex to return the 0-based starting position of the search string
# return -1 if not found
$found = $regex.Match($binString)
if ($found.Success) { $found.Index } else { -1}
like image 26
Theo Avatar answered Oct 05 '22 15:10

Theo