Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using C# to edit text within a binary file

Tags:

c#

I have a binary file (i.e., it contains bytes with values between 0x00 and 0xFF). There are also ASCII strings in the file (e.g., "Hello World") that I want to find and edit using Regex. I then need to write out the edited file so that it's exactly the same as the old one but with my ASCII edits having been performed. How?

        byte[] inbytes = File.ReadAllBytes(wfile);
        string instring = utf8.GetString(inbytes);
        // use Regex to find/replace some text within instring
        byte[] outbytes = utf8.GetBytes(instring);
        File.WriteAllBytes(outfile, outbytes);

Even if I don't do any edits, the output file is different from the input file. What's going on, and how can I do what I want?


EDIT: Ok, I'm trying to use the offered suggestion and am having trouble understanding how to actually implement it. Here's my sample code:

        string infile = @"C:\temp\in.dat";
        string outfile = @"C:\temp\out.dat";
        Regex re = new Regex(@"H[a-z]+ W[a-z]+");  // looking for "Hello World"
        byte[] inbytes = File.ReadAllBytes(infile);
        string instring = new SoapHexBinary(inbytes).ToString();
        Match match = re.Match(instring);
        if (match.Success)
        {
            // do work on 'instring'
        }
        File.WriteAllBytes(outfile, SoapHexBinary.Parse(instring).Value);

Obviously, I know I'll not get a match doing it that way, but if I convert my Regex to a string (or whatever), then I can't use Match, etc. Any ideas? Thanks!

like image 561
Barry Dysert Avatar asked Mar 29 '26 02:03

Barry Dysert


2 Answers

Not all binary strings are valid UTF-8 strings. When you try to interpret the binary as a UTF-8 string, the bytes that can't be thus interpreted are probably getting mangled. Basically, if the whole file is not encoded text, then interpreting it as encoded text will not yield sensible results.

like image 72
Thom Smith Avatar answered Apr 03 '26 18:04

Thom Smith


An alternative to playing with binary file can be: converting it to hex string, working on it(Regex can be used here) and then saving it back

byte[] buf = File.ReadAllBytes(file);
var str = new SoapHexBinary(buf).ToString();

//str=89504E470D0A1A0A0000000D49484452000000C8000000C808030000009A865EAC00000300504C544......
//Do your work

File.WriteAllBytes(file,SoapHexBinary.Parse(str).Value);

PS: Namespace : System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary

like image 33
L.B Avatar answered Apr 03 '26 16:04

L.B



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!