Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does csplit on OS X not recognise '$' as end-of-line character?

Tags:

unix

macos

(I'm using Mac OS X, and this question might be specific to that variant of Unix)

I'm trying to split a file using csplit with a regular expression. It consists of various articles merged into one single long text file. Each article ends with "All Rights Reserved". This is at the end of the line: grep Reserved$ finds them all. Only, csplit claims there is no match.

csplit filename /Reserved$/

yields

csplit: Reserved$: no match

which is a clear and obvious lie. If I leave out the $, it works; but I want to be sure that I don't get any stray occurrences of 'Reserved' in the middle of the text. I tried a different word with the beginning-of-line character ^, and that seems to work. Other words (which do occur at the end of a line in the data) also do not match when used (eg and$).

Is this a known bug with OS X?

[Update: I made sure it's not a DOS/Unix line end character issue by removing all carriage return characters]

like image 684
Oliver Mason Avatar asked Oct 05 '22 23:10

Oliver Mason


1 Answers

I have downloaded the source code of csplit from http://www.opensource.apple.com/source/text_cmds/text_cmds-84/csplit/csplit.c and tested this in the debugger.

The pattern is compiled with

if (regcomp(&cre, re, REG_BASIC|REG_NOSUB) != 0)
    errx(1, "%s: bad regular expression", re);

and the lines are matched with

/* Read and output lines until we get a match. */
first = 1;
while ((p = csplit_getline()) != NULL) {
    if (fputs(p, ofp) == EOF)
        break;
    if (!first && regexec(&cre, p, 0, NULL, 0) == 0)
        break;
    first = 0;
}

The problem is now that the lines returned by csplit_getline() still have a trailing newline character \n. Therefore "Reserved" are not the last characters in the string and the pattern "Reserved$" does not match.

After a quick-and-dirty insertion of

    p[strlen(p)-1] = 0;

to remove the trailing newline from the input string the "Reserved$" pattern worked as expected.

There seem to be more problems with csplit in Mac OS X, see the remarks to the answer of Looking for correct Regular Expression for csplit (the repetition count {*} does also not work).

Remark: You can match "Reserved" at the end of the line with the following trick:

csplit filename /Reserved<Ctrl-V><Ctrl-J>/

where you actually use the Control keys to enter a newline character on the command line.

like image 144
Martin R Avatar answered Oct 13 '22 10:10

Martin R