Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

POSIX Regular Expressions: Excluding a word in an expression?

I am trying to create a regular expression using POSIX (Extended) Regular Expressions that I can use in my C program code.

Specifically, I have come up with the following, however, I want to exclude the word "http" within the matched expressions. Upon some searching, it doesn't look like POSIX makes it obvious for catching specific strings. I am using something called a "negative look-a-head" in the below example (i.e. the (?!http:) ). However, I fear that this may only be something available to regular expressions defined in dialects other than POSIX. Is negative lookahead allowed? Is the logical NOT operator allowed in POSIX (i.e. ! )?

Working regular expression example:

href|HREF|src[[:space:]]=[[:space:]]\"(?!http:)[^\"]+\"[/]

If I cannot use negative-lookahead like in other dialects, what can I do to the above regular expression to filter out the specific word "http:"? Ideally, is there any way without inverse logic and ultimately creating a ridiculously long regular expression in the process? (the one I have above is quite long already, I'd rather it not look more confusing if possible)

[NOTE: I have consulted other related threads in Stack Overflow, but the most relevant ones seem to only ask this question "generically", which means answers given didn't necessarily mean they were POSIX-flavored ==> in another thread or two, I've seen the above (?!insertWordToExcludeHere) negative lookahead, but I fear it's only for PHP.)

[NOTE 2: I will take any POSIX regular expression phrasings as well, any help would be appreciated. Does anyone have a suggestion on how whatever regular expression that would filter out "http:" would look like and how it could be fit into my current regular expression, replacing the (?!http:)?]

like image 805
9codeMan9 Avatar asked Mar 13 '13 05:03

9codeMan9


People also ask

How do you exclude words in regex?

To represent this, we use a similar expression that excludes specific characters using the square brackets and the ^ (hat). For example, the pattern [^abc] will match any single character except for the letters a, b, or c.

How do you match everything except a word in regex?

How do you ignore something in regex? To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself.

How do you regex only words?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

How do you make a regular expression case insensitive?

To disable case-sensitive matching for regexp , use the 'ignorecase' option.


1 Answers

According to http://www.regular-expressions.info/refflavors.html lookaheads and lookbehinds are not in the POSIX flavour.

You may consider thinking in terms of lexing (tokenization) and parsing if your problem is too complex to be represented cleanly as a regex.

like image 104
Patashu Avatar answered Sep 21 '22 04:09

Patashu