I have an email dump of around 400mb. I want to split this into .txt files, consisting of one mail in each file. Every e-mail starts with the standard HTML header specifying the doctype.
This means I will have to split my files based on the above said header. How do I go about it in linux?
To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
Open the Zip file. Open the Tools tab. Click the Split Size dropdown button and select the appropriate size for each of the parts of the split Zip file.
The command "csplit" can be used to split a file into different files based on certain pattern in the file or line numbers. we can split the file into two new files ,each having part of the contents of the original file, using csplit.
If you have a mail.txt
$ cat mail.txt <html> mail A </html> <html> mail B </html> <html> mail C </html>
run csplit
to split by <html>
$ csplit mail.txt '/^<html>$/' '{*}' - mail.txt => input file - /^<html>$/ => pattern match every `<html>` line - {*} => repeat the previous pattern as many times as possible
check output
$ ls mail.txt xx00 xx01 xx02 xx03
If you want do it in awk
$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt $ ls 1.txt 5.txt 9.txt mail.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With