Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace commas enclosed in curly braces

I try to replace commas with semicolons enclosed in curly braces.

Sample string:

text = "a,b,{'c','d','e','f'},g,h"

I am aware that it comes down to lookbehinds and lookaheads, but somehow it won't work like I want it to:

substr = re.sub(r"(?<=\{)(.+?)(,)(?=.+\})",r"\1;", text)

It returns:

a,b,{'c';'d','e','f'},g,h

However, I am aiming for the following:

a,b,{'c';'d';'e';'f'},g,h

Any idea how I can achieve this? Any help much appreciated :)

like image 974
Vincent Hahn Avatar asked Jan 14 '16 10:01

Vincent Hahn


People also ask

What is enclosed in curly braces?

How are curly brackets used? Curly brackets are commonly used in programming languages such as C, Java, Perl, and PHP to enclose groups of statements or blocks of code.

Will be enclosed within curly braces?

A compound statement is a sequence of zero or more statements enclosed within curly braces. Compound statements are frequently used in selection and loop statements. They enable you to write loop bodies that are more than one statement long, among other things. A compound statement is sometimes called a block.

How do you type a curly brace?

If it is a windows keyboard you can do (alt+123) for '{' and (alt+125) for '}'. On a Mac the shortcuts are (shift + alt + 8) for '{' and (shift + alt + 9) for '}'.

How do you write curly braces in C?

In programming, curly braces (the { and } characters) are used in a variety of ways. In C/C++, they are used to signify the start and end of a series of statements. In the following expression, everything between the { and } are executed if the variable mouseDOWNinText is true. See event loop.


Video Answer


3 Answers

You can match the whole block {...} (with {[^{}]+}) and replace commas inside it only with a lambda:

import re
text = "a,b,{'c','d','e','f'},g,h"
print(re.sub(r"{[^{}]+}", lambda x: x.group(0).replace(",", ";"), text))

See IDEONE demo

Output: a,b,{'c';'d';'e';'f'},g,h

By declaring lambda x we can get access to each match object, and get the whole match value using x.group(0). Then, all we need is replace a comma with a semi-colon.

This regex does not support recursive patterns. To use a recursive pattern, you need PyPi regex module. Something like m = regex.sub(r"\{(?:[^{}]|(?R))*}", lambda x: x.group(0).replace(",", ";"), text) should work.

like image 88
Wiktor Stribiżew Avatar answered Sep 30 '22 09:09

Wiktor Stribiżew


Below I have posted a solution that does not rely on an regular expression. It uses a stack (list) to determine if a character is inside a curly bracket {. Regular expression are more elegant, however, they can be harder to modify when requirements change. Please note that the example below also works for nested brackets.

text = "a,b,{'c','d','e','f'},g,h"
output=''
stack = []
for char in text:
    if char == '{':
        stack.append(char)
    elif char == '}':
        stack.pop()    
    #Check if we are inside a curly bracket
    if len(stack)>0 and char==',':
        output += ';'
    else:
        output += char
print output

This gives:

'a,b,{'c';'d';'e';'f'},g,h

You can also rewrite this as a map function if you use a the global variable for stack:

stack = []


def replace_comma_in_curly_brackets(char):
    if char == '{':
       stack.append(char)
    elif char == '}':
        stack.pop()    
    #Check if we are inside a curly bracket
    if len(stack)>0 and char==',':
        return ';'

    return char

text = "a,b,{'c','d','e','f'},g,h"
print ''.join(map(str, map(replace_comma_in_curly_brackets,text)))

Regarding performance, when running the above two methods and the regular expression solution proposed by @stribizhev on the test string at the end of this post, I get the following timings:

  1. Regular expression (@stribizshev): 0.38 seconds
  2. Map function: 26.3 seconds
  3. For loop: 251 seconds

This is the test string that is 55,300,00 characters long:

 text = "a,able,about,across,after,all,almost,{also,am,among,an,and,any,are,as,at,be,because},been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your" * 100000
like image 21
Alex Avatar answered Sep 30 '22 08:09

Alex


If you don't have nested braces, it might be enough to just look ahead at each ,
if there is a closing } ahead without any opening { in between. Search for

,(?=[^{]*})

and replace with ;

  • , match a comma literally
  • (?=...) the lookahead to check
  • if there's ahead [^{]* any amount of characters, that are not {
  • followed by a closing curly brace }

See demo at regex101

like image 31
bobble bubble Avatar answered Sep 30 '22 09:09

bobble bubble