I try to replace commas with semicolons enclosed in curly braces. Sample string: <pre class="prettyprint"><code>text = "a,b,{'c','d','e','f'},g,h" </code></pre> I am aware that it comes down to lookbehinds and lookaheads, but somehow it won't work like I want it to: <pre class="prettyprint"><code>substr = re.sub(r"(?<=\{)(.+?)(,)(?=.+\})",r"\1;", text) </code></pre> It returns: <pre class="prettyprint"><code>a,b,{'c';'d','e','f'},g,h </code></pre> However, I am aiming for the following: <pre class="prettyprint"><code>a,b,{'c';'d';'e';'f'},g,h </code></pre> Any idea how I can achieve this? Any help much appreciated :)

If you don't have nested braces, it might be enough to just look ahead at each <code>,</code> if there is a closing <code>}</code> ahead without any opening <code>{</code> in between. Search for <pre class="prettyprint"><code>,(?=[^{]*}) </code></pre> and replace with <code>;</code> <ul> <li> <code>,</code> match a comma literally</li> <li> <code>(?=</code>...<code>)</code> the lookahead to check</li> <li>if there's ahead <code>[^{]*</code> any amount of characters, that are not <code>{</code> </li> <li>followed by a closing curly brace <code>}</code> </li> </ul> See demo at regex101

Replace commas enclosed in curly braces

Tags:

python

json

regex

python-3.x

I try to replace commas with semicolons enclosed in curly braces.

Sample string:

text = "a,b,{'c','d','e','f'},g,h"

I am aware that it comes down to lookbehinds and lookaheads, but somehow it won't work like I want it to:

substr = re.sub(r"(?<=\{)(.+?)(,)(?=.+\})",r"\1;", text)

It returns:

a,b,{'c';'d','e','f'},g,h

However, I am aiming for the following:

a,b,{'c';'d';'e';'f'},g,h

Any idea how I can achieve this? Any help much appreciated :)

974

asked Jan 14 '16 10:01

Vincent Hahn

Video Answer

3 Answers

You can match the whole block {...} (with {[^{}]+}) and replace commas inside it only with a lambda:

import re
text = "a,b,{'c','d','e','f'},g,h"
print(re.sub(r"{[^{}]+}", lambda x: x.group(0).replace(",", ";"), text))

See IDEONE demo

Output: a,b,{'c';'d';'e';'f'},g,h

By declaring lambda x we can get access to each match object, and get the whole match value using x.group(0). Then, all we need is replace a comma with a semi-colon.

This regex does not support recursive patterns. To use a recursive pattern, you need PyPi regex module. Something like m = regex.sub(r"\{(?:[^{}]|(?R))*}", lambda x: x.group(0).replace(",", ";"), text) should work.

answered Sep 30 '22 09:09

Wiktor Stribiżew

Below I have posted a solution that does not rely on an regular expression. It uses a stack (list) to determine if a character is inside a curly bracket {. Regular expression are more elegant, however, they can be harder to modify when requirements change. Please note that the example below also works for nested brackets.

text = "a,b,{'c','d','e','f'},g,h"
output=''
stack = []
for char in text:
    if char == '{':
        stack.append(char)
    elif char == '}':
        stack.pop()    
    #Check if we are inside a curly bracket
    if len(stack)>0 and char==',':
        output += ';'
    else:
        output += char
print output

This gives:

'a,b,{'c';'d';'e';'f'},g,h

You can also rewrite this as a map function if you use a the global variable for stack:

stack = []


def replace_comma_in_curly_brackets(char):
    if char == '{':
       stack.append(char)
    elif char == '}':
        stack.pop()    
    #Check if we are inside a curly bracket
    if len(stack)>0 and char==',':
        return ';'

    return char

text = "a,b,{'c','d','e','f'},g,h"
print ''.join(map(str, map(replace_comma_in_curly_brackets,text)))

Regarding performance, when running the above two methods and the regular expression solution proposed by @stribizhev on the test string at the end of this post, I get the following timings:

Regular expression (@stribizshev): 0.38 seconds
Map function: 26.3 seconds
For loop: 251 seconds

This is the test string that is 55,300,00 characters long:

 text = "a,able,about,across,after,all,almost,{also,am,among,an,and,any,are,as,at,be,because},been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your" * 100000

answered Sep 30 '22 08:09

Alex

If you don't have nested braces, it might be enough to just look ahead at each ,
if there is a closing } ahead without any opening { in between. Search for

,(?=[^{]*})

and replace with ;

, match a comma literally
(?=...) the lookahead to check
if there's ahead [^{]* any amount of characters, that are not {
followed by a closing curly brace }

See demo at regex101