(I'm using Mac OS X, and this question might be specific to that variant of Unix)
I'm trying to split a file using csplit
with a regular expression. It consists of various articles merged into one single long text file. Each article ends with "All Rights Reserved". This is at the end of the line: grep Reserved$
finds them all. Only, csplit
claims there is no match.
csplit filename /Reserved$/
yields
csplit: Reserved$: no match
which is a clear and obvious lie. If I leave out the $
, it works; but I want to be sure that I don't get any stray occurrences of 'Reserved' in the middle of the text. I tried a different word with the beginning-of-line character ^
, and that seems to work. Other words (which do occur at the end of a line in the data) also do not match when used (eg and$
).
Is this a known bug with OS X?
[Update: I made sure it's not a DOS/Unix line end character issue by removing all carriage return characters]
I have downloaded the source code of csplit from http://www.opensource.apple.com/source/text_cmds/text_cmds-84/csplit/csplit.c and tested this in the debugger.
The pattern is compiled with
if (regcomp(&cre, re, REG_BASIC|REG_NOSUB) != 0)
errx(1, "%s: bad regular expression", re);
and the lines are matched with
/* Read and output lines until we get a match. */
first = 1;
while ((p = csplit_getline()) != NULL) {
if (fputs(p, ofp) == EOF)
break;
if (!first && regexec(&cre, p, 0, NULL, 0) == 0)
break;
first = 0;
}
The problem is now that the lines returned by csplit_getline()
still have a trailing newline character \n
. Therefore "Reserved" are not the last characters in the string and the pattern "Reserved$" does not match.
After a quick-and-dirty insertion of
p[strlen(p)-1] = 0;
to remove the trailing newline from the input string the "Reserved$" pattern worked as expected.
There seem to be more problems with csplit in Mac OS X, see the remarks to the answer of Looking for correct Regular Expression for csplit (the repetition count {*}
does also not work).
Remark: You can match "Reserved" at the end of the line with the following trick:
csplit filename /Reserved<Ctrl-V><Ctrl-J>/
where you actually use the Control keys to enter a newline character on the command line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With