Suppose I have a string with lots of random stuff in it like the following:
strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
And I'm interested in obtaining the substring sitting between 'Value=' and '&', which in this example would be 'five'.
I can use a regex like the following:
match = re.search(r'Value=?([^&>]+)', strJunk)
>>> print match.group(0)
Value=five
>>> print match.group(1)
five
How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? And is there a way for me to just get 'five' as the only result? (This question stems from me only having a tenuous grasp of regex)
I am also going to have to make a substitution in this string such such as the following:
val1 = match.group(1)
strJunk.replace(val1, "six", 1)
Which yields:
'asdf2adsf29Value=six&lakl23ljk43asdldl'
Considering that I plan on performing the above two tasks (finding the string between 'Value=' and '&', as well as replacing that value) over and over, I was wondering if there are any other more efficient ways of looking for the substring and replacing it in the original string. I'm fine sticking with what I've got but I just want to make sure that I'm not taking up more time than I have to be if better methods are out there.
Named groups make it easier to get the group contents afterwards. Compiling your regex once, and then reusing the compiled object, will be much more efficient than recompiling it for each use (which is what happens when you call re.search repeatedly). You can use positive lookbehind and lookahead assertions to make this regex suitable for the substitution you want to do.
>>> value_regex = re.compile("(?<=Value=)(?P<value>.*?)(?=&)")
>>> match = value_regex.search(strJunk)
>>> match.group('value')
'five'
>>> value_regex.sub("six", strJunk)
'asdf2adsf29Value=six&lakl23ljk43asdldl'
I'm not exactly sure if you're parsing URLs, in which case, you should be definitely using the urlparse module.
However, given that this is not your question, the ability to split on multiple fields using regular expressions is extremely fast in Python, so you should be able to do what you want as follows:
import re
strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = re.split(r'[&=]', strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)
Hope this helps!
EDIT:
If you will split multiple times, you can use re.compile() to compile the regular expression. So you'll have:
import re
rx_split_on_delimiters = re.compile(r'[&=]') # store this somewhere
strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = rx_split_on_delimiters.split(strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With