Python finding substring between certain characters using regex and replace()

Question

Suppose I have a string with lots of random stuff in it like the following:

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"

And I'm interested in obtaining the substring sitting between 'Value=' and '&', which in this example would be 'five'.

I can use a regex like the following:

 match = re.search(r'Value=?([^&>]+)', strJunk)
 >>> print match.group(0)
 Value=five
 >>> print match.group(1)
 five

How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? And is there a way for me to just get 'five' as the only result? (This question stems from me only having a tenuous grasp of regex)

I am also going to have to make a substitution in this string such such as the following:

 val1 = match.group(1)
 strJunk.replace(val1, "six", 1)

Which yields:

 'asdf2adsf29Value=six&lakl23ljk43asdldl'

Considering that I plan on performing the above two tasks (finding the string between 'Value=' and '&', as well as replacing that value) over and over, I was wondering if there are any other more efficient ways of looking for the substring and replacing it in the original string. I'm fine sticking with what I've got but I just want to make sure that I'm not taking up more time than I have to be if better methods are out there.

David German · Accepted Answer

Named groups make it easier to get the group contents afterwards. Compiling your regex once, and then reusing the compiled object, will be much more efficient than recompiling it for each use (which is what happens when you call re.search repeatedly). You can use positive lookbehind and lookahead assertions to make this regex suitable for the substitution you want to do.

>>> value_regex = re.compile("(?<=Value=)(?P<value>.*?)(?=&)")
>>> match = value_regex.search(strJunk)
>>> match.group('value')
'five'
>>> value_regex.sub("six", strJunk)
'asdf2adsf29Value=six&lakl23ljk43asdldl'

Mahmoud Abdelkader · Answer

I'm not exactly sure if you're parsing URLs, in which case, you should be definitely using the urlparse module.

However, given that this is not your question, the ability to split on multiple fields using regular expressions is extremely fast in Python, so you should be able to do what you want as follows:

import re

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = re.split(r'[&=]', strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

Hope this helps!

EDIT:

If you will split multiple times, you can use re.compile() to compile the regular expression. So you'll have:

import re
rx_split_on_delimiters = re.compile(r'[&=]')  # store this somewhere

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = rx_split_on_delimiters.split(strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

Python finding substring between certain characters using regex and replace()

Tags:

python

string

regex

replace

jCuga

2 Answers

David German

Mahmoud Abdelkader

Recent Activity

Donate For Us

Python finding substring between certain characters using regex and replace()

Tags:

python

string

regex

replace

jCuga

2 Answers

David German

Mahmoud Abdelkader

Related questions

Recent Activity

Donate For Us