Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep on unix / linux: how to replace or capture text?

So I'm pretty good with regular expressions, but I'm having some trouble with them on unix. Here are two things I'd love to know how to do:

1) Replace all text except letters, numbers, and underscore

In PHP I'd do this: (works great)

preg_replace('#[^a-zA-Z0-9_]#','',$text).

In bash I tried this (with limited success); seems like it dosen't allow you to use the full set of regex:

text="my #1 example!"
${text/[^a-zA-Z0-9_]/'')

I tried it with sed but it still seems to have problems with the full regex set:

echo "my #1 example!" | sed s/[^a-zA-Z0-9\_]//

I'm sure there is a way to do it with grep, too, but it was breaking it into multiple lines when i tried:

echo abc\!\@\#\$\%\^\&\*\(222 | grep -Eos '[a-zA-Z0-9\_]+'

And finally I also tried using expr but it seemed like that had really limited support for extended regex...


2) Capture (multiple) parts of text

In PHP I could just do something like this:

preg_match('#(word1).*(word2)#',$text,$matches);

I'm not sure how that would be possible in *nix...

like image 734
cwd Avatar asked Jan 22 '11 06:01

cwd


2 Answers

Part 1

You are almost there with the sed just add the g modifier so that the replacement happen globally, without the g, replacement will happen just once.

$ echo "my #1 example!" | sed s/[^a-zA-Z0-9\_]//g
my1example
$

You did the same mistake with your bash pattern replacement too: not making replacements globally:

$ text="my #1 example!"

# non-global replacement. Only the space is delete.
$ echo ${text/[^a-zA-Z0-9_]/''}
my#1 example!

# global replacement by adding an additional / 
$ echo ${text//[^a-zA-Z0-9_]/''}
my1example

Part 2

Capturing works the same in sed as it did in PHP's regex: enclosing the pattern in parenthesis triggers capturing:

# swap foo and bar's number using capturing and back reference.
$ echo 'foo1 bar2' | sed -r 's/foo([0-9]+) bar([0-9]+)/foo\2 bar\1/'
foo2 bar1
$ 
like image 144
codaddict Avatar answered Sep 27 '22 23:09

codaddict


As an alternative to codaddict's nice answer using sed, you could also use tr for the first part of your question.

echo "my #1 _ example!" | tr -d -C '[[:alnum:]_]'

I've also made use of the [:alnum:] character class, just to show another option.

like image 42
Michael J. Barber Avatar answered Sep 27 '22 23:09

Michael J. Barber