Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you escape a hyphen as character range in a POSIX regex

Tags:

regex

grep

bash

I have a csv file full of values such as this:

0.00145423,3.03795e-05

I wanted to check that all the lines were consistent so I tried to grep for any unexpected characters like so...

grep '[^0-9,e\-\.]' myfile

In my mind it goes like this: find a line with any character [] that is not ^ a number 0-9, comma ,, letter e e, hyphen \- (attempted to escape with \), or a period \.. However, hyphens still continue match.

[EDIT]This does not happen in python, only with bash/grep:

>>> re.search("[^0-9,e\-\.]", "0.00145423,3.03795e-05")
>>> 

unsatisfying solution:
If I move the escaped hyphen to the end it works:

grep '[^0-9,e\.\-]' myfile

Putting the escaped hyphen next to the 0-9 range results in grep: Invalid range end.

Can someone explain what's going on? Is this some bash argument parsing issue or something specific to grep?

bash4.3.33, grep2.21

like image 292
jozxyqk Avatar asked Feb 13 '15 09:02

jozxyqk


1 Answers

The way to include a literal - in a character list is to put it in the first or last position of the bracket expression, exactly as shown in the answer at: Get final special character with a regular expression.

From POSIX 9.3.5 RE Bracket Expression: The <hyphen> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression.

Some tools might have additional ways of doing it with some kind of escaping but you're always safe to just put it first or last. Note that - isn't the only character that has different behavior depending where it shows up in a bracket expression. Consider ], and ^ as well.

like image 149
Ed Morton Avatar answered Sep 20 '22 15:09

Ed Morton