Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex and sed aren't the same...?

Tags:

string

regex

sed

Get these strings:

00543515703528
00582124628575
0034911320020
0034911320020
005217721320739
0902345623
067913187056
00543515703528

Apply this exp in java: ^(06700|067|00)([0-9]*).

My intention is to remove leading "06700, 067 and 00" from the beggining of the string.

It is all cool in java, group 2 always have the number I intend to, but in sed it isnt the same:

$ cat strings|sed -e 's/^\(06700|067|00\)\([0-9]*\)/\2/g'
00543515703528
00582124628575
0034911320020
0034911320020
005217721320739
0902345623
067913187056
00543515703528

What the heck am I missing?

Cheers,

f.

like image 828
filippo Avatar asked May 18 '11 12:05

filippo


2 Answers

When using extended regular expressions, you also need to omit the \ before ( and ). This works for me:

sed -r 's/^(06700|067|00)([0-9]*)/\2/g' strings 

note also that there's no need for a separate call to cat

like image 133
OpenSauce Avatar answered Nov 10 '22 00:11

OpenSauce


I believe your problem is this:

sed defaults to BRE: The default behaviour of sed is to support Basic Regular Expressions (BRE). To use all the features described on this page set the -r (Linux) or -E (BSD) flag to use Extended Regular Expressions

Source

Without this flag, the | character is interpreted literally. Try this example:

echo "06700|067|0055555" | sed -e 's/^\(06700|067|00\)\([0-9]*\)/\2/g'
like image 43
Dark Falcon Avatar answered Nov 10 '22 00:11

Dark Falcon