Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for correct Regular Expression for csplit

Tags:

regex

split

I have a file that contains several lines like these:

1291126929200 started 88 videolist15.txt 4 Good 4
1291126929250 59.875 29.0 29.580243595150186 43.016096916037604
1291126929296 59.921 29.0 29.52749417740926 42.78632483544682
1291126929359 59.984 29.0 29.479540161281143 42.56031951027556
1291126929437 60.046 50.0 31.345036510255586 42.682281485516945
1291126932859 started 88 videolist15.txt 5 Good 4

I want to split the files for every line that contains started (or videolist, does not matter).

The following command only produces 2 output files:

$ csplit -k input.txt /started/

However I expect a lot more, as can be seen in:

$ grep -i started input.txt |wc -l
$ 146

What would be the correct csplit command?

like image 818
slhck Avatar asked Dec 01 '10 11:12

slhck


2 Answers

Add {*} at the end:

$ csplit -k input.txt /started/ {*}

The man page says:

{*}    repeat the previous pattern as many times as possible.

Note that some shells may interpret this as a special pattern. In that case, quote it with "{*}".

Also, make sure you use the GNU version of csplit, which is available under macOS with brew install coreutils.

Demo:

$ cat file
1
foo
2
foo
3
foo
$ csplit -k file /foo/ {*}
2
6
6
4
$ ls -tr xx*             
xx03  xx02  xx01  xx00
$ csplit --version
csplit (GNU coreutils) 7.4
like image 183
codaddict Avatar answered Oct 18 '22 21:10

codaddict


According to the Open Group specifications the csplit command accepts basic regular expressions.

Basic REGEXPs are a limited subset of full regex implementations. They support literal characters, asterisk (*), dot (.), character classes ([0-9]) and anchors (^,$). They don't support one-or-more (+) or alternation (a|b).

like image 36
cbare Avatar answered Oct 18 '22 22:10

cbare