In my text I want to replace all leading tabs with two spaces but leave the non-leading tabs alone.
For example:
a
\tb
\t\tc
\td\te
f\t\tg
("a\n\tb\n\t\tc\n\td\te\nf\t\tg"
)
should turn into:
a
b
c
d\te
f\t\tg
("a\n b\n c\n d\te\nf\t\tg"
)
For my case I could do that with multiple replacement operations, repeating as many times as the many maximum nesting level or until nothing changes.
But wouldn't it also be possible to do in a single run?
I tried but didn't manage to come up with something, the best I came up yet was with lookarounds:
re.sub(r'(^|(?<=\t))\t', ' ', a, flags=re.MULTILINE)
Which "only" makes one wrong replacement (second tab between f
and g
).
Now it might be that it's simply impossible to do in regex in a single run because the already replaced parts can't be matched again (or rather the replacement does not happen right away) and you can't sort-of "count" in regex, in this case I would love to see some more detailed explanations on why (as long as this won't shift too much into [cs.se] territory).
I am working in Python currently but this could apply to pretty much any similar regex implementation.
You may match the tabs at the start of the lines, and use a lambda inside re.sub
to replace with the double spaces multiplied by the length of the match:
import re
s = "a\n\tb\n\t\tc\n\td\te\nf\t\tg";
print(re.sub(r"^\t+", lambda m: " "*len(m.group()), s, flags=re.M))
See the Python demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With