Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OS X sed -E doesn't accept extended regular expressions

Tags:

regex

macos

sed

I've been trying various ways to do some basic things with sed on OS X. Here are the results of some simple tests.

echo "foo bar 2011-03-17 17:31:47 foo bar" | sed 's/foo/FOUND/g' 

returns (as expected)

FOUND bar 2011-03-17 17:31:47 FOUND bar 

but

echo "foo bar 2011-03-17 17:31:47 foo bar" | sed -E 's/\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/FOUND/g' 

returns

foo bar 2011-03-17 17:31:47 foo bar 

and (even more irritatingly)

echo "food bar 2011-03-17 17:31:47 food bar" | sed -E 's/\d/FOUND/g' 

returns

fooFOUND bar 2011-03-17 17:31:47 fooFOUND bar 

Now, the man sed pages say that

The following options are available:   -E      Interpret regular expressions as extended (modern) regular          expressions rather than basic regular expressions (BRE's).  The          re_format(7) manual page fully describes both formats. 

and man re_format says

          \d  Matches a digit character.  This is equivalent to           `[[:digit:]]'. 

And indeed:

echo "foo bar 2011-03-17 17:31:47 foo bar" | sed -E 's/[[:digit:]]{4}/FOUND/g' 

gives me

foo bar FOUND-03-17 17:31:47 foo bar 

...but this is annoying. Either because I'm being dense, or because the man pages are lying to me (to be honest, I'd prefer the former).

A quick literature review here on SO suggests that I am not alone in this, and that many recommend installing GNU coreutils (or indeed use something else - say perl -pe) -- however, I'd like to be certain:

Do EREs work with sed as it is bundled with OS X -- as implied by the man pages -- or not?

(I'm on 10.8 and 10.6.8)

like image 557
mediaczar Avatar asked Aug 29 '12 13:08

mediaczar


People also ask

Does sed support regular expressions?

A regular expression is a string that can be used to describe several sequences of characters. Regular expressions are used by several different Unix commands, including ed, sed, awk, grep, and to a more limited extent, vi.

What type of regex does sed use?

As Avinash Raj has pointed out, sed uses basic regular expression (BRE) syntax by default, (which requires ( , ) , { , } to be preceded by \ to activate its special meaning), and -r option switches over to extended regular expression (ERE) syntax, which treats ( , ) , { , } as special without preceding \ .

How do you make sed not greedy?

sed does not support "non greedy" operator. You have to use "[]" operator to exclude "/" from match. P.S. there is no need to backslash "/".

What does R mean in sed?

On most versions of sed (but not all), the 'r' (read) and 'w' (write) commands must be followed by exactly one space, then the filename, and then terminated by a newline. Any additional characters before or after the filename are interpreted as part of the filename.


1 Answers

On macOS, \d is part of a regex feature set called enhanced features - note the distinction in name: enhanced, which is NOT the same as extended.

Instead, enhanced features are a separate dimension from basic vs. extended, which can be activated for both basic and extended regexes. In other words: you can have enhanced basic regexes as well as enhanced extended regexes.

However, it appears that whether enhanced features are available in a given utility is precompiled into it; in other words: a given utility either supports enhanced features or it doesn't - no option can change that. (Options only allow you to choose between basic and extended, such as -E for sed and grep.)

For a description of all enhanced features, see section ENHANCED FEATURES in man re_format.

It should also be noted that if POSIX compatibility is important, enhanced features should be avoided with sed.

There are POSIX utilities, such as awk, that do support EREs (extended regular expressions), but (a), the POSIX spec explicitly has to state so, and (b) the syntax is limited to POSIX EREs, which are less powerful than the EREs offered by specific platforms.


In practice:

Sadly, the man pages for the various utilities do NOT state whether a given utility supports enhanced regex features, so it comes down to trial and error.

As of macOS 10.15:

macOS sed does NOT support enhanced features, which explains the OP's experience.

  • E.g., sed -E 's/\d//g' <<<'a10' has no effect, because \d isn't recognized as representing a digit (only [[:digit:]] is).

I have found only one utility that supports enhanced features: grep:

grep    -o '\d\+' <<<'a10' # -> '10' - enhanced basic regex grep -E -o '\d+'  <<<'a10' # -> '10' - enhanced extended regex 

If you know of others that do, please let us know.

like image 81
mklement0 Avatar answered Sep 22 '22 17:09

mklement0