Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression - replace all spaces in beginning of line with periods

I don't care if I achieve this through vim, sed, awk, python etc. I tried in all, could not get it done.

For an input like this:

top           f1    f2    f3
   sub1       f1    f2    f3
   sub2       f1    f2    f3
      sub21   f1    f2    f3
   sub3       f1    f2    f3

I want:

top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

Then I want to just load this up in Excel (delimited by whitespace) and still be able to look at the hierarchy-ness of the first column!

I tried many things, but end up losing the hierarchy information

like image 674
shikhanshu Avatar asked Oct 03 '17 23:10

shikhanshu


2 Answers

With this as the input:

$ cat file
top           f1    f2    f3
   sub1       f1    f2    f3
   sub2       f1    f2    f3
      sub21   f1    f2    f3
   sub3       f1    f2    f3

Try:

$ sed -E ':a; s/^( *) ([^ ])/\1.\2/; ta' file
top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

How it works:

  • :a

    This creates a label a.

  • s/^( *) ([^ ])/\1.\2/

    If the line begins with spaces, this replaces the last space in the leading spaces with a period.

    In more detail, ^( *) matches all leading blanks except the last and stores them in group 1. The regex ([^ ]) (which, despite what stackoverflow makes it look like, consists of a blank followed by ([^ ])) matches a blank followed by a nonblank and stores the nonblank in group 2.

    \1.\2 replaces the matched text with group 1, followed by a period, followed by group 2.

  • ta

    If the substituted command resulted in a substitution, then branch back to label a and try over again.

Compatibility:

  1. The above was tested on modern GNU sed. For BSD/OSX sed, one might or might not need to use:

    sed -E -e :a -e 's/^( *) ([^ ])/\1.\2/' -e ta file
    

    On ancient GNU sed, one needs to use -r in place of -E:

    sed -r ':a; s/^( *) ([^ ])/\1.\2/; ta' file
    
  2. The above assumed that the spaces were blanks. If they are tabs, then you will have to decide what your tabstop is and make substitutions accordingly.

like image 54
John1024 Avatar answered Oct 21 '22 03:10

John1024


There are two different ways to do this in vim.

  1. With a regex:

    :%s/^\s\+/\=repeat('.', len(submatch(0)))
    

    This is fairly straightforward, but a little verbose. It uses the eval register (\=) to generate a string of '.'s the same length as the number of spaces at the beginning of each line.

  2. With a norm command:

    :%norm ^hviwr.
    

    This is a much more conveniently short command, although it's a little harder to understand. It visually selects the spaces at the beginning of a line, and replaces the whole selection with dots. If there is no leading space, the command will fail on ^h because the cursor attempts to move out of bounds.

    To see how this works, try typing ^hviwr. on a line that has leading spaces to see it happen.

like image 29
DJMcMayhem Avatar answered Oct 21 '22 04:10

DJMcMayhem