Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting large text file on every blank line

Tags:

I'm having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following:

dasdas #42319 blaablaa 50 50 content content more content content conclusion  asdasd #92012 blaablaa 30 70 content again more of it content conclusion  asdasd #299 yadayada 60 40 content content contend done ...and so on 

A typical information table in my file has anywhere between 10-40 rows.

I would like this file to be split in n smaller files, where n is the amount of content tables.

That is

dasdas #42319 blaablaa 50 50 content content more content content conclusion 

would be its own separate file, (whateverN.txt)

and

asdasd #92012 blaablaa 30 70 content again more of it content conclusion 

again a separate file whateverN+1.txt and so forth.

It seems like awk or Perl are nifty tools for this, but having never used them before the syntax is kinda baffling.

I found these two questions that are almost correspondent to my problem, but failed to modify the syntax to fit my needs:

Split text file into multiple files & How can I split a text file into multiple text files? (on Unix & Linux)

How should one modify the command line inputs, so that it solves my problem?

like image 518
tropical e Avatar asked Oct 23 '15 04:10

tropical e


People also ask

How do I split text in an empty line?

To split text by empty line, split the string on two newline characters, e.g. my_str. split('\n\n') for POSIX encoded files and my_str. split('\r\n\r\n') for Windows encoded files.

How do I split a large file into multiple smaller pieces?

If you have an existing Zip file that you want to split into multiple pieces, WinZip gives you the ability to do that. Open the Zip file. Open the Tools tab. Click the Split Size dropdown button and select the appropriate size for each of the parts of the split Zip file.

How do you break a file into parts?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.


1 Answers

Setting RS to null tells awk to use one or more blank lines as the record separator. Then you can simply use NR to set the name of the file corresponding to each new record:

 awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt 

RS: This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text.

$ cat file.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion  asdasd #92012 blaablaa 30 70 content again more of it content conclusion  asdasd #299 yadayada 60 40 content content contend done  $ awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt  $ ls whatever-*.txt whatever-1.txt  whatever-2.txt  whatever-3.txt  $ cat whatever-1.txt  dasdas #42319 blaablaa 50 50 content content more content content conclusion  $ cat whatever-2.txt  asdasd #92012 blaablaa 30 70 content again more of it content conclusion  $ cat whatever-3.txt  asdasd #299 yadayada 60 40 content content contend done $  
like image 157
jas Avatar answered Oct 13 '22 00:10

jas