Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the pipe operator as part of a regular expression?

Tags:

python

regex

I want to match the url within strings like

u1 = "Check this out http://www.cnn.com/stuff lol"
u2 = "see http://www.cnn.com/stuff2"
u3 = "http://www.espn.com/stuff3 is interesting"

Something like the following works, but it's cumbersome because I have to repeat the whole pattern

re.findall("[^ ]*.cnn.[^ ]*|[^ ]*.espn.[^ ]*", u1)

Particularly, in my real code I wanted to match a much larger number of web sites. Ideally I can do something similar to

re.findall("[^ ]*.cnn|espn.[^ ]*", u1)

but of course it doesn't work now because I am not specifying the web site name correctly. How can this be done better? Thanks.

like image 413
ceiling cat Avatar asked Apr 24 '11 21:04

ceiling cat


People also ask

What is the use of pipe in regular expression?

Indicates that a match can be one of the two terms on either side of the pipe. Used at the beginning of an expression, denotes where a match should begin.

What does a pipe character represent in a regular expression?

A pipe symbol allows regular expression components to be logically ORed. For example, the following regular expression matches lines that start with the word "Germany" or the word "Netherlands". Note that parentheses are used to group the two expressive components.

What is [] in regular expression?

The [] construct in a regex is essentially shorthand for an | on all of the contents. For example [abc] matches a, b or c. Additionally the - character has special meaning inside of a [] . It provides a range construct. The regex [a-z] will match any letter a through z.


1 Answers

Non-capturing groups allow you to group characters without having that group also be returned as a match.

cnn|espn becomes (?:cnn|espn):

re.findall("[^ ]*\.(?:cnn|espn)\.[^ ]*", u1)

Also note that . is a regex special character (it will match any character except newline). To match the . character itself, you must escape it with \.

like image 75
Ignacio Vazquez-Abrams Avatar answered Sep 22 '22 01:09

Ignacio Vazquez-Abrams