I have an email dump of around 400mb. I want to split this into .txt files, consisting of one mail in each file. Every e-mail starts with the standard HTML header specifying the doctype. This means I will have to split my files based on the above said header. How do I go about it in linux?

If you have a <code>mail.txt</code> <pre class="prettyprint"><code>$ cat mail.txt <html> mail A </html> <html> mail B </html> <html> mail C </html> </code></pre> run <code>csplit</code> to split by <code><html></code> <pre class="prettyprint"><code>$ csplit mail.txt '/^<html>$/' '{*}' - mail.txt => input file - /^<html>$/ => pattern match every `<html>` line - {*} => repeat the previous pattern as many times as possible </code></pre> check output <pre class="prettyprint"><code>$ ls mail.txt xx00 xx01 xx02 xx03 </code></pre> <hr> If you want do it in <code>awk</code> <pre class="prettyprint"><code>$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt $ ls 1.txt 5.txt 9.txt mail.txt </code></pre>

Splitting a file in linux based on content [duplicate]

1 Answers

If you have a mail.txt

$ cat mail.txt <html>     mail A </html>  <html>     mail B </html>  <html>     mail C </html>

run csplit to split by <html>

$ csplit mail.txt '/^<html>$/' '{*}'   - mail.txt    => input file  - /^<html>$/  => pattern match every `<html>` line  - {*}         => repeat the previous pattern as many times as possible

check output

$ ls mail.txt  xx00  xx01  xx02  xx03

If you want do it in awk

$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt $ ls 1.txt  5.txt  9.txt  mail.txt

165

answered Sep 22 '22 15:09

kev

Related questions
                            
                                Linux cmd to search for a class file among jars irrespective of jar path
                            
                                linux script to kill java process
                            
                                error while loading shared libraries: libncurses.so.5:
                            
                                List file using ls command in Linux with full path
                            
                                fastest way convert tab-delimited file to csv in linux
                            
                                libaio.so.1: cannot open shared object file
                            
                                Randomly Pick Lines From a File Without Slurping It With Unix
                            
                                C++ cross-compiler from Windows to Linux [closed]
                            
                                Implementing an update/upgrade system for embedded Linux devices
                            
                                Use BlueZ Stack As A Peripheral (Advertiser)
                            
                                cross-platform scripting for windows, Linux, MacOS X [closed]
                            
                                building a .so that is also an executable
                            
                                Pipe buffer size is 4k or 64k?
                            
                                preserving file permissions for samba shares when file is edited
                            
                                How do I throttle my site's API users?
                            
                                Neo4j WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual
                            
                                Expression after last specific character
                            
                                tr command - how to replace the string "\n" with an actual newline (\n) [closed]
                            
                                find -name "*.xyz" -o -name "*.abc" -exec to Execute on all found files, not just the last suffix specified
                            
                                Bash: wait with timeout

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Splitting a file in linux based on content [duplicate]

Tags:

file

linux

bash

sed

awk

Greenhorn

People also ask

1 Answers

kev

Recent Activity

Donate For Us