Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting A File On Delimiter

Tags:

linux

split

awk

I have a file on a Linux system that is roughly 10GB. It contains 20,000,000 binary records, but each record is separated by an ASCII delimiter "$". I would like to use the split command or some combination thereof to chunk the file into smaller parts. Ideally I would be able to specify that the command should split every 1,000 records (therefore every 1,000 delimiters) into separate files. Can anyone help with this?

like image 875
Jeffrey Kevin Pry Avatar asked Jun 01 '11 11:06

Jeffrey Kevin Pry


People also ask

How do I split a file?

Open the Zip file. Open the Tools tab. Click the Split Size dropdown button and select the appropriate size for each of the parts of the split Zip file. If you choose Custom Size in the Split Size dropdown list, another small window will open and allow you to enter in a custom size specified in megabytes.

How do I split a single file into multiple files in Unix?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.


1 Answers

The only unorthodox part of the problem seems to be the record separator. I'm sure this is fixable in awk pretty simply - but I happen to hate awk.

I would transfer it in the realm of 'normal' problems first:

tr '$' '\n' < large_records.txt | split -l 1000

This will by default create xaa, xab, xac... files; look at man split for more options

like image 77
sehe Avatar answered Oct 19 '22 23:10

sehe