I have a binary stream on standard input, it's in a fixed size format, a continuos stream of packets, each packet has a header with length X and a body with length Y.
So if X=2 Y=6 then it's something like 00abcdef01ghijkl02mnopqr03stuvwx
, but it's binary and both the header and data can contain any "characters" (including '\0' and newline), the example is just for readability.
I want to get rid of the header data so the output looks like this: abcdefghijklmnopqrstuvwx
.
Are there any commands in the unix toolchain that allow me to do this? And in general are there any tools for handling binary data? The only tool I could think of is od
/hexdump
but how do you convert the result back to binary?
When you issue a command to Bash, it searches specific directories on your system to see whether such a command exists. If the command does exist, then Bash executes it. Bash is also a command, and it's usually the default command executed when you open a terminal window or log into a text console.
(Some older shells may have bigger trouble if the input doesn't end with a newline.) You can't process binary data in the shell, but modern versions of utilities on most unices can cope with arbitrary data. To pass all input through to the output, use cat.
If you want to be able to deal with binary file in shell, the best option (only?) is to work with hexdump tool. hexdump -v -e '/1 "%u "' binary.file | while read c; do echo $c done head -cX binary.file | hexdump -v -e '/1 "%u "' | while read c; do echo $c done Read length (and work with 0 as length) and then "string" as byte decimal value:
substring found! Bash can be used to manipulate strings when the requirements are simple. However when things get complicated, such as to work on complex patterns and logic, bash does not fair well. In such cases very sophisticated and commonly used data manipulation ‘awk’ is prefered.
Use xxd
which goes to and from a hexdump.
xxd -c 123 -ps
will output your stream with 123 bytes per line. To reverse use
xxd -r -p
You should now be able to put this together with cut
to drop characters since you can do something like
cut -c 3-
to get all characters from 3 to the end of a line. Do not forget to use a number of characters equal to 2X to account for two hex characters per byte.
So something along the lines of
xxd -c X+Y -ps | cut -c 2X+1- | xxd -r -p
where X+Y
and 2X+1
are replaced with actual numerical values. You'll need to put your datastream somewhere appropriate in to the above command.
Perl is a pretty standard unix tool. Pipe it to perl. If its fixed length byte aligned a simple substr operation should work. Here is a perl sample that should work.
#!/usr/bin/env perl
use strict;
use warnings;
my $buf;
my $len = 8;
my $off = 2;
while(sysread(STDIN,$buf,$len) != 0 ){
print substr($buf,$off);
}
exit 0;
As a one-liner, I'd write:
perl -00 -ne 'chomp; while (/(?:..)(......)/sg) {print $1}'
example:
echo '00abcdef01ghijkl02mnopqr03stuvw
00abcdef01ghi
kl02mnopqr' | perl -00 -ne 'chomp; while (/(?:..)(......)/sg) {print $1}' | od -c
produces
0000000 a b c d e f g h i j k l m n o p
0000020 q r s t u v w \n a b c d e f g h
0000040 i \n k l m n o p q r
0000052
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With