I have a file on a Linux system that is roughly 10GB. It contains 20,000,000 binary records, but each record is separated by an ASCII delimiter "$". I would like to use the split command or some combination thereof to chunk the file into smaller parts. Ideally I would be able to specify that the command should split every 1,000 records (therefore every 1,000 delimiters) into separate files. Can anyone help with this?
Open the Zip file. Open the Tools tab. Click the Split Size dropdown button and select the appropriate size for each of the parts of the split Zip file. If you choose Custom Size in the Split Size dropdown list, another small window will open and allow you to enter in a custom size specified in megabytes.
To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
The only unorthodox part of the problem seems to be the record separator. I'm sure this is fixable in awk pretty simply - but I happen to hate awk.
I would transfer it in the realm of 'normal' problems first:
tr '$' '\n' < large_records.txt | split -l 1000
This will by default create xaa, xab, xac... files; look at man split for more options
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With