Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a single replacement operation replace all leading tabs with spaces

Tags:

python

regex

In my text I want to replace all leading tabs with two spaces but leave the non-leading tabs alone.

For example:

a
\tb
\t\tc
\td\te
f\t\tg

("a\n\tb\n\t\tc\n\td\te\nf\t\tg")

should turn into:

a
  b
    c
  d\te
f\t\tg

("a\n b\n c\n d\te\nf\t\tg")

For my case I could do that with multiple replacement operations, repeating as many times as the many maximum nesting level or until nothing changes.

But wouldn't it also be possible to do in a single run?

I tried but didn't manage to come up with something, the best I came up yet was with lookarounds:

re.sub(r'(^|(?<=\t))\t', '  ', a, flags=re.MULTILINE)

Which "only" makes one wrong replacement (second tab between f and g).

Now it might be that it's simply impossible to do in regex in a single run because the already replaced parts can't be matched again (or rather the replacement does not happen right away) and you can't sort-of "count" in regex, in this case I would love to see some more detailed explanations on why (as long as this won't shift too much into [cs.se] territory).

I am working in Python currently but this could apply to pretty much any similar regex implementation.

like image 275
phk Avatar asked Dec 24 '22 02:12

phk


1 Answers

You may match the tabs at the start of the lines, and use a lambda inside re.sub to replace with the double spaces multiplied by the length of the match:

import re
s = "a\n\tb\n\t\tc\n\td\te\nf\t\tg";
print(re.sub(r"^\t+", lambda m: "  "*len(m.group()), s, flags=re.M))

See the Python demo

like image 135
Wiktor Stribiżew Avatar answered Jun 03 '23 01:06

Wiktor Stribiżew