Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match spaces that are NOT in a multiple of 4?

I'm re-formatting a Python script using notepad++, but some lines are not indented by 4 (or 8, 12, 16, etc.) spaces.

So I need to match consecutive leading white-spaces (i.e. indentation at beginning of each line), which are NOT in multiple of 4, i.e. spaces in number of 1, 2, 3, 5, 6, 7, 9, 10, 11, etc.

e.g.

>>>   a = 1      # match this, as there're 3 spaces at the beginning
>>>       b = a  # match this too, as indent by 7 spaces
>>>    c = 2     # but not this, since it's indented exactly by 4 spaces
>>>        d = c # not this either, since indented by 8 spaces

I was able to match spaces in multiple of 4 using something like:

^( {16}| {12}| {8}| {4})

then I tried to match the opposite of this with something like:

^[^( {16}| {12}| {8}| {4})]

but it only matches empty line or line start with a character, not what I want.

I'm a complete newbie to regex, but I've searched for hours with no luck. I know I could always match with all the non-multiple-of-4 numbers listed, but I was hoping someone could help and provide a less-cumbersome method.

Thanks.

Update 1

using regex (@user2864740)

^(?:\s{4})*\s{1,3}\S

or (@alpha bravo)

^(?!(\s{4})+\S)(.*)

matches non-multiple-of-4 indents, as well as empty line(s) with 4 (8, 16, etc.) spaces and the first character of the first non-empty line following them.

e.g. (on regex101.com)

How to avoid matching these situations described in the example above?

like image 576
H S Avatar asked Aug 15 '14 02:08

H S


2 Answers

A character class can only contain .. a set of characters, and thus [^..] is not suitable for a general negation. The regular expression [^( {16}| {12}| {8}| {4})] is equivalent to [^( {16}|284] which would match every character not listed.

Now, to match not a multiple of 4 spaces is the same as finding n mod 4 = {1, 2, 3} (or anything except n mod 4 = 0) spaces. This can be done with a pattern such as the following:

(?:\s{4})*\s{1,3}\S

Explanation:

(?:\s{4})*  - match any number of whole groups of 4 spaces and then ..
\s{1,3}     - match any count of 1, 2, or 3 spaces such that ..
\S          - they are not followed by a space

The regular expression may need a trailing dot-all (.*) or leading line-anchor (^), depending on how it is used.

like image 91
user2864740 Avatar answered Sep 20 '22 13:09

user2864740


I could offer a python script that'll tell you which lines are improperly indented:

with open('path/to/code/file') as infile:
    for i,line in enumerate(infile,1):
        total = len(line)
        whitespace = total-len(line.lstrip(' '))
        if whitespace%4:
            print("Inconsistent indenting on line", i)
like image 38
inspectorG4dget Avatar answered Sep 19 '22 13:09

inspectorG4dget