Split one file into multiple files based on pattern

Tags:

I have a binary file which I convert into a regular file using hexdump and few awk and sed commands. The output file looks something like this -

$cat temp
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000
000000087d3f513000000000000000000000000000000000001001001010f000000000026 
58783100b354c52658783100b43d3d0000ad6413400103231665f301010b9130194899f2f
fffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f433031000000000004
6363070000000000000000000000000065450000b4fb6b4000393d3d1116cdcc57e58287d
3f55285a1084b

The temp file has few eye catchers (3d3d) which don't repeat that often. They kinda denote a start of new binary record. I need to split the file based on those eye catchers.

My desired output is to have multiple files (based on the number of eyecatchers in my temp file).

So my output would look something like this -

$cat temp1
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e582000000000000000
0000000000087d3f513000000000000000000000000000000000001001001010f00000000
002658783100b354c52658783100b4

$cat temp2
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc0
15800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000
000000000065450000b4fb6b400039

$cat temp3
3d3d1116cdcc57e58287d3f55285a1084b

968

asked Nov 09 '11 07:11

jaypal singh

3 Answers

The RS variable in awk is nice for this, allowing you to define the record separator. Thus, you just need to capture each record in its own temp file. The simplest version is:

cat temp |
  awk -v RS="3d3d" '{ print $0 > "temp" NR }'

The sample text starts with the eye-catcher 3d3d, so temp1 will be an empty file. Further, the eye-catcher itself won't be at the start of the temp files, as was shown for the temp files in the question. Finally, if there are a lot of records, you could run into the system limit on open files. Some minor complications will bring it closer to what you want and make it safer:

cat temp |
  awk -v RS="3d3d" 'NR > 1 { print RS $0 > "temp" (NR-1); close("temp" (NR-1)) }'

155

answered Nov 07 '22 02:11

Michael J. Barber

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=3d3d)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

answered Nov 07 '22 00:11

rob mayoff

This might work:

# sed 's/3d3d/\n&/2g' temp | split -dl1 - temp
# ls
temp temp00  temp01  temp02
# cat temp00
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000000000087d3f513000000000000000000000000000000000001001001010f000000000026 58783100b354c52658783100b4
# cat temp01
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000000000000065450000b4fb6b400039
# cat temp02
3d3d1116cdcc57e58287d3f55285a1084b

EDIT:

If there are newlines in the source file you can remove them first by using tr -d '\n' <temp and then pipe the output through the above sed command. If however you wish to preserve them then:

 sed 's/3d3d/\n&/g;s/^\n\(3d3d\)/\1/' temp |csplit -zf temp - '/^3d3d/' {*}

Should do the trick

answered Nov 07 '22 01:11

potong

Related questions
                            
                                BASH: How to remove all files except those named in a manifest?
                            
                                Inside a bash script, how to get PID from a program executed when using the eval command?
                            
                                How to escape plus sign on mac os x (BSD) sed?
                            
                                bash command to copy file from one computer to another
                            
                                Replace string in a file if line starts with another string
                            
                                Fastest way to shuffle lines in a file in Linux
                            
                                How do you fix the Shellshock vulnerability on Debian 6.0 (Squeeze)? [closed]
                            
                                Rounding up float point numbers bash
                            
                                Bash Centos7 "which" command
                            
                                jq count the number of items in json by a specific key
                            
                                How to execute bash commands from C? [duplicate]
                            
                                Ctrl-p and Ctrl-n behaving unexpectedly under Docker
                            
                                bash printf with new line
                            
                                How to find files except given name?
                            
                                ffmpeg not working with filenames that have whitespace
                            
                                Execute Subprocess in Background
                            
                                What is the use of $# in Bash
                            
                                BASH blank alias to 'cd'
                            
                                How to import multiple locations to PYTHONPATH (bash)
                            
                                Run bash commands in parallel, track results and count

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Split one file into multiple files based on pattern

Tags:

bash

split

sed

awk