Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split text file in two using bash script

I have a text file with a marker somewhere in the middle:

one
two
three
blah-blah *MARKER* blah-blah
four
five
six
...

I just need to split this file in two files, first containing everything before MARKER, and second one containing everything after MARKER. It seems it can be done in one line with awk or sed, I just can't figure out how.

I tried the easy way — using csplit, but csplit doesn't play well with Unicode text.

like image 261
Sergey Kovalev Avatar asked Sep 04 '10 22:09

Sergey Kovalev


People also ask

How do I split a file in bash?

Read a file (data stream, variable) line-by-line (and/or field-by-field)? We can use sed with w option to split a file into mutiple files. Files can be split by specifying line address or pattern.

How do I split a text file in Linux?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.


4 Answers

you can do it easily with awk

awk -vRS="MARKER" '{print $0>NR".txt"}' file
like image 144
ghostdog74 Avatar answered Oct 01 '22 02:10

ghostdog74


Try this:

awk '/MARKER/{n++}{print >"out" n ".txt" }' final.txt

It will read input from final.txt and produces out1.txt, out2.txt, etc...

like image 31
Leniel Maccaferri Avatar answered Oct 01 '22 01:10

Leniel Maccaferri


sed -n '/MARKER/q;p' inputfile > outputfile1
sed -n '/MARKER/{:a;n;p;ba}' inputfile > outputfile2

Or all in one:

sed -n -e '/MARKER/! w outputfile1' -e'/MARKER/{:a;n;w outputfile2' -e 'ba}' inputfile
like image 32
Dennis Williamson Avatar answered Oct 01 '22 01:10

Dennis Williamson


The split command will almost do what you want:

$ split -p '\*MARKER\*' splitee 
$ cat xaa
one
two
three
$ cat xab
blah-blah *MARKER* blah-blah
four
five
six
$ tail -n+2 xab
four
five
six

Perhaps it's close enough for your needs.

I have no idea if it does any better with Unicode than csplit, though.

like image 31
Marcelo Cantos Avatar answered Oct 01 '22 01:10

Marcelo Cantos