Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python finding substring between certain characters using regex and replace()

Suppose I have a string with lots of random stuff in it like the following:

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"

And I'm interested in obtaining the substring sitting between 'Value=' and '&', which in this example would be 'five'.

I can use a regex like the following:

 match = re.search(r'Value=?([^&>]+)', strJunk)
 >>> print match.group(0)
 Value=five
 >>> print match.group(1)
 five

How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? And is there a way for me to just get 'five' as the only result? (This question stems from me only having a tenuous grasp of regex)

I am also going to have to make a substitution in this string such such as the following:

 val1 = match.group(1)
 strJunk.replace(val1, "six", 1)    

Which yields:

 'asdf2adsf29Value=six&lakl23ljk43asdldl'

Considering that I plan on performing the above two tasks (finding the string between 'Value=' and '&', as well as replacing that value) over and over, I was wondering if there are any other more efficient ways of looking for the substring and replacing it in the original string. I'm fine sticking with what I've got but I just want to make sure that I'm not taking up more time than I have to be if better methods are out there.

like image 421
jCuga Avatar asked Jan 07 '11 04:01

jCuga


2 Answers

Named groups make it easier to get the group contents afterwards. Compiling your regex once, and then reusing the compiled object, will be much more efficient than recompiling it for each use (which is what happens when you call re.search repeatedly). You can use positive lookbehind and lookahead assertions to make this regex suitable for the substitution you want to do.

>>> value_regex = re.compile("(?<=Value=)(?P<value>.*?)(?=&)")
>>> match = value_regex.search(strJunk)
>>> match.group('value')
'five'
>>> value_regex.sub("six", strJunk)
'asdf2adsf29Value=six&lakl23ljk43asdldl'
like image 60
David German Avatar answered Sep 22 '22 12:09

David German


I'm not exactly sure if you're parsing URLs, in which case, you should be definitely using the urlparse module.

However, given that this is not your question, the ability to split on multiple fields using regular expressions is extremely fast in Python, so you should be able to do what you want as follows:

import re

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = re.split(r'[&=]', strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

Hope this helps!

EDIT:

If you will split multiple times, you can use re.compile() to compile the regular expression. So you'll have:

import re
rx_split_on_delimiters = re.compile(r'[&=]')  # store this somewhere

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = rx_split_on_delimiters.split(strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)
like image 24
Mahmoud Abdelkader Avatar answered Sep 21 '22 12:09

Mahmoud Abdelkader