I'm having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following: <pre class="prettyprint lang-none prettyprint-override"><code>dasdas #42319 blaablaa 50 50 content content more content content conclusion asdasd #92012 blaablaa 30 70 content again more of it content conclusion asdasd #299 yadayada 60 40 content content contend done ...and so on </code></pre> A typical information table in my file has anywhere between 10-40 rows. I would like this file to be split in n smaller files, where n is the amount of content tables. That is <pre class="prettyprint lang-none prettyprint-override"><code>dasdas #42319 blaablaa 50 50 content content more content content conclusion </code></pre> would be its own separate file, (<code>whateverN.txt</code>) and <pre class="prettyprint lang-none prettyprint-override"><code>asdasd #92012 blaablaa 30 70 content again more of it content conclusion </code></pre> again a separate file <code>whateverN+1.txt</code> and so forth. It seems like <code>awk</code> or <code>Perl</code> are nifty tools for this, but having never used them before the syntax is kinda baffling. I found these two questions that are almost correspondent to my problem, but failed to modify the syntax to fit my needs: Split text file into multiple files & How can I split a text file into multiple text files? (on Unix & Linux) How should one modify the command line inputs, so that it solves my problem?

Setting <code>RS</code> to null tells awk to use one or more blank lines as the record separator. Then you can simply use <code>NR</code> to set the name of the file corresponding to each new record: <pre class="prettyprint"><code> awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt </code></pre> <blockquote> RS: This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text. </blockquote> <pre class="prettyprint"><code>$ cat file.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion asdasd #92012 blaablaa 30 70 content again more of it content conclusion asdasd #299 yadayada 60 40 content content contend done $ awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt $ ls whatever-*.txt whatever-1.txt whatever-2.txt whatever-3.txt $ cat whatever-1.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion $ cat whatever-2.txt asdasd #92012 blaablaa 30 70 content again more of it content conclusion $ cat whatever-3.txt asdasd #299 yadayada 60 40 content content contend done $ </code></pre>

Splitting large text file on every blank line

Tags:

I'm having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following:

dasdas #42319 blaablaa 50 50 content content more content content conclusion  asdasd #92012 blaablaa 30 70 content again more of it content conclusion  asdasd #299 yadayada 60 40 content content contend done ...and so on

A typical information table in my file has anywhere between 10-40 rows.

I would like this file to be split in n smaller files, where n is the amount of content tables.

That is

dasdas #42319 blaablaa 50 50 content content more content content conclusion

would be its own separate file, (whateverN.txt)

and

asdasd #92012 blaablaa 30 70 content again more of it content conclusion

again a separate file whateverN+1.txt and so forth.

It seems like awk or Perl are nifty tools for this, but having never used them before the syntax is kinda baffling.

I found these two questions that are almost correspondent to my problem, but failed to modify the syntax to fit my needs:

Split text file into multiple files & How can I split a text file into multiple text files? (on Unix & Linux)

How should one modify the command line inputs, so that it solves my problem?

518

asked Oct 23 '15 04:10

tropical e

1 Answers

Setting RS to null tells awk to use one or more blank lines as the record separator. Then you can simply use NR to set the name of the file corresponding to each new record:

 awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt

RS: This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text.

$ cat file.txt dasdas #42319 blaablaa 50 50 content content more content content conclusion  asdasd #92012 blaablaa 30 70 content again more of it content conclusion  asdasd #299 yadayada 60 40 content content contend done  $ awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt  $ ls whatever-*.txt whatever-1.txt  whatever-2.txt  whatever-3.txt  $ cat whatever-1.txt  dasdas #42319 blaablaa 50 50 content content more content content conclusion  $ cat whatever-2.txt  asdasd #92012 blaablaa 30 70 content again more of it content conclusion  $ cat whatever-3.txt  asdasd #299 yadayada 60 40 content content contend done $

157

answered Oct 13 '22 00:10

jas

Related questions
                            
                                How to add a Spark Dataframe to the bottom of another dataframe?
                            
                                Configure Django and Google Cloud Storage?
                            
                                Convert null values to empty array in Spark DataFrame
                            
                                Why is Crystal faster than Ruby?
                            
                                plot several image files in matplotlib subplots
                            
                                How do I add to an existing json file in node.js
                            
                                IDX10803: Unable to create to obtain configuration
                            
                                Pandas: Refer to column name, case insensitive
                            
                                Time complexity of string concatenation in Python [duplicate]
                            
                                Using flexbox, left-align and right-align elements in one row
                            
                                What is @angular in Angular 2?
                            
                                Modules are installed using pip on OSX but not found when importing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Splitting large text file on every blank line

Tags:

tropical e

People also ask

1 Answers

jas

Recent Activity

Donate For Us