Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a file in linux based on content [duplicate]

I have an email dump of around 400mb. I want to split this into .txt files, consisting of one mail in each file. Every e-mail starts with the standard HTML header specifying the doctype.

This means I will have to split my files based on the above said header. How do I go about it in linux?

like image 342
Greenhorn Avatar asked Dec 17 '11 10:12

Greenhorn


People also ask

How do you split the contents of a file in Linux?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.

How do I split a file into two parts?

Open the Zip file. Open the Tools tab. Click the Split Size dropdown button and select the appropriate size for each of the parts of the split Zip file.

How do you split a Unix file by pattern?

The command "csplit" can be used to split a file into different files based on certain pattern in the file or line numbers. we can split the file into two new files ,each having part of the contents of the original file, using csplit.


1 Answers

If you have a mail.txt

$ cat mail.txt <html>     mail A </html>  <html>     mail B </html>  <html>     mail C </html> 

run csplit to split by <html>

$ csplit mail.txt '/^<html>$/' '{*}'   - mail.txt    => input file  - /^<html>$/  => pattern match every `<html>` line  - {*}         => repeat the previous pattern as many times as possible 

check output

$ ls mail.txt  xx00  xx01  xx02  xx03 

If you want do it in awk

$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt $ ls 1.txt  5.txt  9.txt  mail.txt 
like image 165
kev Avatar answered Sep 22 '22 15:09

kev