Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex not to match http://

I am facing a problem to match and replace certain words, not contained in http://

Present Regex:

 http://.*?\s+

This matches the pattern http://www.egg1.com http://www.egg2.com

I need a regex to match certain words contained outside the http://

Example:

"This is a sample. http://www.egg1.com and http://egg2.com. This regex will only match 
 this egg1 and egg2 and not the others contained inside http:// "

 Match: egg1 egg2

 Replaced: replaced1 replaced2

Final Output :

 "This is a sample. http://www.egg1.com and http://egg2.com. This regex will only 
  match this replaced1 and replaced2 and not the others contained inside http:// "

QUESTION: Need to match certain patterns (as in example : egg1 egg2) unless they are part of http:// .Do not match egg1 and egg2 if they are present within http://

like image 718
c_prog_90 Avatar asked Jul 28 '11 13:07

c_prog_90


People also ask

How to match a string with regex in Python?

Pattern matching in Python with Regex 1 Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and... 2 Regular expressions can be much more sophisticated. For example, adding a 3 in curly brackets ( {3}) after a pattern is... More ...

How to match a regex pattern inside the target string?

In this article, You will learn how to match a regex pattern inside the target string using the match (), search (), and findall () method of a re module. The re.match () method will start matching a regex pattern from the very first character of the text, and if the match found, it will return a re.Match object.

What is the difference between following regex and regular expression in Python?

Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers. Regular expressions can be much more sophisticated. For example, adding a 3 in curly brackets ( {3}) after a pattern is like saying, “ Match this pattern three times.” So the slightly shorter regex

What is rematch object in regular expression?

If zero or more characters at the beginning of the string match the regular expression pattern, It returns a corresponding match object instance i.e., re.Match object. The match object contains the locations at which the match starts and ends and the actual match value.


2 Answers

One solution I can think of is to form a combined pattern for HTTP-URLs and your pattern, then filter the matches accordingly:

import re

t = "http://www.egg1.com http://egg2.com egg3 egg4"

p = re.compile('(http://\S+)|(egg\d)')
for url, egg in p.findall(t):
  if egg:
    print egg

prints:

egg3
egg4

UPDATE: To use this idiom with re.sub(), just supply a filter function:

p = re.compile(r'(http://\S+)|(egg(\d+))')

def repl(match):
    if match.group(2):
        return 'spam{0}'.format(match.group(3))
    return match.group(0)

print p.sub(repl, t)

prints:

http://www.egg1.com http://egg2.com spam3 spam4
like image 137
Ferdinand Beyer Avatar answered Oct 14 '22 23:10

Ferdinand Beyer


This will not capture http://...:

(?:http://.*?\s+)|(egg1)
like image 36
Karolis Avatar answered Oct 15 '22 00:10

Karolis