Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

re.sub not replacing all occurrences

Tags:

python

regex

I'm not a Python developer, but I'm using a Python script to convert SQLite to MySQL

The suggested script gets close, but no cigar, as they say.

The line giving me a problem is:

line = re.sub(r"([^'])'t'(.)", r"\1THIS_IS_TRUE\2", line)

...along with the equivalent line for false ('f'), of course.

The problem I'm seeing is that only the first occurrence of 't' in any given line is replaced.

So, input to the script,

INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,'t','t','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');

...gives...

INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,THIS_IS_TRUE,'t','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');

I mentioned I'm not a Python developer, but I have tried to fix this myself. According to the documentation, I understand that re.sub should replace all occurrences of 't'.

I'd appreciate a hint as to why I'm only seeing the first occurrence replaced, thanks.

like image 547
Snips Avatar asked Nov 13 '12 15:11

Snips


People also ask

Does re sub replace all occurrences?

By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.

How do I replace only part of a match with Python re sub?

Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text. @Amber: I infer from your answer that unlike str. replace(), we can't use variables a) in raw strings; or b) as an argument to re. sub; or c) both.

How do you replace re subs?

If you want to replace a string that matches a regular expression (regex) instead of perfect match, use the sub() of the re module. In re. sub() , specify a regex pattern in the first argument, a new string in the second, and a string to be processed in the third.

What does re sub () do?

re. sub() function is used to replace occurrences of a particular sub-string with another sub-string. This function takes as input the following: The sub-string to replace.


4 Answers

The two substitutions you'd want in your example overlap - the comma between your two instances of 't' will be matched by (.) in the first case, so ([^']) in the second case never gets a chance to match it. This slightly modified version might help:

line = re.sub(r"(?<!')'t'(?=.)", r"THIS_IS_TRUE", line)

This version uses lookahead and lookbehind syntax, described here.

like image 176
Zero Piraeus Avatar answered Sep 21 '22 11:09

Zero Piraeus


How about

line = line.replace("'t'", "THIS_IS_TRUE").replace("'f'", "THIS_IS_FALSE")

without using re. This replaces all occurrences of 't' and 'f'. Just make sure that no car is named t.

like image 24
eumiro Avatar answered Sep 20 '22 11:09

eumiro


The first match you see is ,'t',. Python proceeds starting with the next character, which is ' (before the second t), subsequently, it cannot match the ([^']) part and skips the second 't'.

In other words, subsequent matches to be replaced cannot overlap.

like image 23
Alexander Pavlov Avatar answered Sep 18 '22 11:09

Alexander Pavlov


using re.sub(r"\bt\b","THIS_IS_TRUE",line):

In [21]: strs="""INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,'t','t','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');"""

In [22]: print re.sub(r"\bt\b","THIS_IS_TRUE",strs)

INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,'THIS_IS_TRUE','THIS_IS_TRUE','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');
like image 33
Ashwini Chaudhary Avatar answered Sep 21 '22 11:09

Ashwini Chaudhary