Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python re.sub, only replace part of match [duplicate]

I am very new to python

I need to match all cases by one regex expression and do a replacement. this is a sample substring --> desired result:

<cross_sell id="123" sell_type="456"> --> <cross_sell>

i am trying to do this in my code:

myString = re.sub(r'\<[A-Za-z0-9_]+(\s[A-Za-z0-9_="\s]+)', "", myString)

instead of replacing everything after <cross_sell, it replaces everything and just returns '>'

is there a way for re.sub to replace only the capturing group instead of the entire pattern?

like image 712
bruinsgirl29 Avatar asked Sep 21 '15 15:09

bruinsgirl29


People also ask

Does re sub replace all occurrences?

By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.

What does re sub () do?

The re. sub() function stands for a substring and returns a string with replaced values. Multiple elements can be replaced using a list when we use this function.

Is there any difference between re match () and re search () in the Python re module?

There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found.

How do you use the RE sub function in Python?

re. sub() function is used to replace occurrences of a particular sub-string with another sub-string. This function takes as input the following: The sub-string to replace. The sub-string to replace with.


2 Answers

You can use substitution groups:

>>> my_string = '<cross_sell id="123" sell_type="456"> --> <cross_sell>'
>>> re.sub(r'(\<[A-Za-z0-9_]+)(\s[A-Za-z0-9_="\s]+)', r"\1", my_string)
'<cross_sell> --> <cross_sell>'

Notice I put the first group (the one you want to keep) in parenthesis and then I kept that in the output by using the "\1" modifier (first group) in the replacement string.

like image 87
mgilson Avatar answered Nov 09 '22 22:11

mgilson


You can use a group reference to match the first word and a negated character class to match the rest of the string between <> :

>>> s='<cross_sell id="123" sell_type="456">'
>>> re.sub(r'(\w+)[^>]+',r'\1',s)
'<cross_sell>'

\w is equal to [A-Za-z0-9_].

like image 38
Mazdak Avatar answered Nov 09 '22 22:11

Mazdak