Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping sed strings correctly

Tags:

bash

escaping

sed

I have a regex and replacement pattern that have both been tested in Notepad++ on my input data and work correctly. When I put them into a sed expression, however, nothing gets matched.

Here is the sed command:

 # SEARCH = ([a-zA-Z0-9.]+) [0-9] (.*)
 # REPLACE = \2 (\1)

 sed -e 's/\([a-zA-Z0-9.]+\) [0-9] \(.*\)/\2 \(\1\)/g'

Here is a sampling of the data:

jdoe 1 Doe, John
jad 1 Doe, Jane
smith 2 Smith, Jon

and the desired output:

Doe, John  (jdoe)
Doe, Jane  (jad)
Smith, Jon (smith)

I have tried removing and adding escapes to different characters in the sed expression, but either get nothing matched or something along the lines of:

sed: -e expression #1, char 42: invalid reference \2 on `s' command's RHS

How can I get this escaped correctly?

like image 330
Chris Lieb Avatar asked Jan 16 '10 00:01

Chris Lieb


2 Answers

I usually find it easier to use the -r switch as this means that escaping is similar to that of most other languages:

sed -r 's/([a-zA-Z0-9.]+) [0-9] (.*)/\2 (\1)/g' file1.txt
like image 158
Mark Byers Avatar answered Sep 30 '22 17:09

Mark Byers


A few warnings and additions to what everyone else has already said:

  1. The -r option is a GNU extension to enable extended regular expressions. BSD derived sed's use -E instead.
  2. Sed and Grep use Basic Regular Expressions
  3. Awk uses Extended Regular Expressions
  4. You should become comfortable with the POSIX specifications such as IEEE Std 1003.1 if you want to write portable scripts, makefiles, etc.

I would recommend rewriting the expression as

's/\([a-zA-Z0-9.]\{1,\}\) [0-9] \(.*\)/\2 (\1)/g'

which should do exactly what you want in any POSIX compliant sed. If you do indeed care about such things, consider defining the POSIXLY_CORRECT environment variable.

like image 23
D.Shawley Avatar answered Sep 30 '22 18:09

D.Shawley