I'm having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following:
dasdas #42319 blaablaa 50 50 content content more content content conclusion asdasd #92012 blaablaa 30 70 content again more of it content conclusion asdasd #299 yadayada 60 40 content content contend done ...and so on
A typical information table in my file has anywhere between 10-40 rows.
I would like this file to be split in n smaller files, where n is the amount of content tables.
That is
dasdas #42319 blaablaa 50 50 content content more content content conclusion
would be its own separate file, (whateverN.txt
)
and
asdasd #92012 blaablaa 30 70 content again more of it content conclusion
again a separate file whateverN+1.txt
and so forth.
It seems like awk
or Perl
are nifty tools for this, but having never used them before the syntax is kinda baffling.
I found these two questions that are almost correspondent to my problem, but failed to modify the syntax to fit my needs:
Split text file into multiple files & How can I split a text file into multiple text files? (on Unix & Linux)
How should one modify the command line inputs, so that it solves my problem?
To split text by empty line, split the string on two newline characters, e.g. my_str. split('\n\n') for POSIX encoded files and my_str. split('\r\n\r\n') for Windows encoded files.
If you have an existing Zip file that you want to split into multiple pieces, WinZip gives you the ability to do that. Open the Zip file. Open the Tools tab. Click the Split Size dropdown button and select the appropriate size for each of the parts of the split Zip file.
To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
Setting RS
to null tells awk to use one or more blank lines as the record separator. Then you can simply use NR
to set the name of the file corresponding to each new record:
awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt
RS: This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text.
$ cat file.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion asdasd #92012 blaablaa 30 70 content again more of it content conclusion asdasd #299 yadayada 60 40 content content contend done $ awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt $ ls whatever-*.txt whatever-1.txt whatever-2.txt whatever-3.txt $ cat whatever-1.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion $ cat whatever-2.txt asdasd #92012 blaablaa 30 70 content again more of it content conclusion $ cat whatever-3.txt asdasd #299 yadayada 60 40 content content contend done $
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With