Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash/unix toolchain binary stream processing/slicing

I have a binary stream on standard input, it's in a fixed size format, a continuos stream of packets, each packet has a header with length X and a body with length Y.

So if X=2 Y=6 then it's something like 00abcdef01ghijkl02mnopqr03stuvwx, but it's binary and both the header and data can contain any "characters" (including '\0' and newline), the example is just for readability.

I want to get rid of the header data so the output looks like this: abcdefghijklmnopqrstuvwx.

Are there any commands in the unix toolchain that allow me to do this? And in general are there any tools for handling binary data? The only tool I could think of is od/hexdump but how do you convert the result back to binary?

like image 314
Karoly Horvath Avatar asked Aug 16 '11 15:08

Karoly Horvath


People also ask

How does Bash work in Linux?

When you issue a command to Bash, it searches specific directories on your system to see whether such a command exists. If the command does exist, then Bash executes it. Bash is also a command, and it's usually the default command executed when you open a terminal window or log into a text console.

Is it possible to process binary data in the shell?

(Some older shells may have bigger trouble if the input doesn't end with a newline.) You can't process binary data in the shell, but modern versions of utilities on most unices can cope with arbitrary data. To pass all input through to the output, use cat.

How to read a binary file in shell script?

If you want to be able to deal with binary file in shell, the best option (only?) is to work with hexdump tool. hexdump -v -e '/1 "%u "' binary.file | while read c; do echo $c done head -cX binary.file | hexdump -v -e '/1 "%u "' | while read c; do echo $c done Read length (and work with 0 as length) and then "string" as byte decimal value:

Can bash be used to manipulate strings?

substring found! Bash can be used to manipulate strings when the requirements are simple. However when things get complicated, such as to work on complex patterns and logic, bash does not fair well. In such cases very sophisticated and commonly used data manipulation ‘awk’ is prefered.


3 Answers

Use xxd which goes to and from a hexdump.

xxd -c 123 -ps

will output your stream with 123 bytes per line. To reverse use

xxd -r -p

You should now be able to put this together with cut to drop characters since you can do something like

cut -c 3-

to get all characters from 3 to the end of a line. Do not forget to use a number of characters equal to 2X to account for two hex characters per byte.

So something along the lines of

xxd -c X+Y -ps | cut -c 2X+1- | xxd -r -p

where X+Y and 2X+1 are replaced with actual numerical values. You'll need to put your datastream somewhere appropriate in to the above command.

like image 180
borrible Avatar answered Sep 18 '22 22:09

borrible


Perl is a pretty standard unix tool. Pipe it to perl. If its fixed length byte aligned a simple substr operation should work. Here is a perl sample that should work.

#!/usr/bin/env perl

use strict;
use warnings;

my $buf;
my $len = 8;
my $off = 2;
while(sysread(STDIN,$buf,$len) != 0 ){
  print substr($buf,$off);
}

exit 0;

like image 41
bot403 Avatar answered Sep 17 '22 22:09

bot403


As a one-liner, I'd write:

perl -00 -ne 'chomp; while (/(?:..)(......)/sg) {print $1}'

example:

echo '00abcdef01ghijkl02mnopqr03stuvw
00abcdef01ghi
kl02mnopqr' | perl -00 -ne 'chomp; while (/(?:..)(......)/sg) {print $1}' | od -c

produces

0000000   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p
0000020   q   r   s   t   u   v   w  \n   a   b   c   d   e   f   g   h
0000040   i  \n   k   l   m   n   o   p   q   r
0000052
like image 40
glenn jackman Avatar answered Sep 17 '22 22:09

glenn jackman