I try to replace commas with semicolons enclosed in curly braces.
Sample string:
text = "a,b,{'c','d','e','f'},g,h"
I am aware that it comes down to lookbehinds and lookaheads, but somehow it won't work like I want it to:
substr = re.sub(r"(?<=\{)(.+?)(,)(?=.+\})",r"\1;", text)
It returns:
a,b,{'c';'d','e','f'},g,h
However, I am aiming for the following:
a,b,{'c';'d';'e';'f'},g,h
Any idea how I can achieve this? Any help much appreciated :)
How are curly brackets used? Curly brackets are commonly used in programming languages such as C, Java, Perl, and PHP to enclose groups of statements or blocks of code.
A compound statement is a sequence of zero or more statements enclosed within curly braces. Compound statements are frequently used in selection and loop statements. They enable you to write loop bodies that are more than one statement long, among other things. A compound statement is sometimes called a block.
If it is a windows keyboard you can do (alt+123) for '{' and (alt+125) for '}'. On a Mac the shortcuts are (shift + alt + 8) for '{' and (shift + alt + 9) for '}'.
In programming, curly braces (the { and } characters) are used in a variety of ways. In C/C++, they are used to signify the start and end of a series of statements. In the following expression, everything between the { and } are executed if the variable mouseDOWNinText is true. See event loop.
You can match the whole block {...}
(with {[^{}]+}
) and replace commas inside it only with a lambda:
import re
text = "a,b,{'c','d','e','f'},g,h"
print(re.sub(r"{[^{}]+}", lambda x: x.group(0).replace(",", ";"), text))
See IDEONE demo
Output: a,b,{'c';'d';'e';'f'},g,h
By declaring lambda x
we can get access to each match object, and get the whole match value using x.group(0)
. Then, all we need is replace a comma with a semi-colon.
This regex does not support recursive patterns. To use a recursive pattern, you need PyPi regex module. Something like m = regex.sub(r"\{(?:[^{}]|(?R))*}", lambda x: x.group(0).replace(",", ";"), text)
should work.
Below I have posted a solution that does not rely on an regular expression. It uses a stack (list
) to determine if a character is inside a curly bracket {
. Regular expression are more elegant, however, they can be harder to modify when requirements change. Please note that the example below also works for nested brackets.
text = "a,b,{'c','d','e','f'},g,h"
output=''
stack = []
for char in text:
if char == '{':
stack.append(char)
elif char == '}':
stack.pop()
#Check if we are inside a curly bracket
if len(stack)>0 and char==',':
output += ';'
else:
output += char
print output
This gives:
'a,b,{'c';'d';'e';'f'},g,h
You can also rewrite this as a map
function if you use a the global variable for stack
:
stack = []
def replace_comma_in_curly_brackets(char):
if char == '{':
stack.append(char)
elif char == '}':
stack.pop()
#Check if we are inside a curly bracket
if len(stack)>0 and char==',':
return ';'
return char
text = "a,b,{'c','d','e','f'},g,h"
print ''.join(map(str, map(replace_comma_in_curly_brackets,text)))
Regarding performance, when running the above two methods and the regular expression solution proposed by @stribizhev on the test string at the end of this post, I get the following timings:
This is the test string that is 55,300,00 characters long:
text = "a,able,about,across,after,all,almost,{also,am,among,an,and,any,are,as,at,be,because},been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your" * 100000
If you don't have nested braces, it might be enough to just look ahead at each ,
if there is a closing }
ahead without any opening {
in between. Search for
,(?=[^{]*})
and replace with ;
,
match a comma literally(?=
...)
the lookahead to check[^{]*
any amount of characters, that are not {
}
See demo at regex101
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With