Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace paired square brackets with other syntax with sed?

Tags:

bash

sed

I want to replace all pairs of square brackets in a file, e.g., [some text], with \macro{some text}, e.g.:

This is some [text].
This [line] has [some more] text.

This becomes:

This is some \macro{text}.
This \macro{line} has \macro{some more} text.
  • The pairs only occur on individual lines, never across multiple lines.
  • Sometimes there might be more than one pair on a single line, but they are never nested.
  • If a bracket is found alone on a line, without a pair, then it should not be changed.

How can I replace these pairs of brackets with this code?

like image 514
Village Avatar asked May 18 '12 03:05

Village


People also ask

How do you replace something with sed?

Find and replace text within a file using sed command The procedure to change the text in files under Linux/Unix using sed: Use Stream EDitor (sed) as follows: sed -i 's/old-text/new-text/g' input.txt. The s is the substitute command of sed for find and replace.

How do you escape square brackets in regex?

If you want to remove the [ or the ] , use the expression: "\\[|\\]" . The two backslashes escape the square bracket and the pipe is an "or".

What do square brackets do in bash?

The square brackets are a synonym for the test command. An if statement checks the exit status of a command in order to decide which branch to take. grep -q "$text" is a command, but "$name" = 'Bob' is not--it's just an expression.


1 Answers

It took a little doing, but here:

sed -i.bkup  's/\[\([^]]*\)\]/\\macro{\1}/g' test.txt

Let's see if I can explain this regular expression:

  1. The \[ is matching a square bracket. Since [ is a valid magic regular expression character, the backslash means to match the literal character.
  2. The \(...\) is a capture group. It captures the part of the regular expression I want. I can have many capture groups, and in sed I can reference them as \1, \2, etc.
  3. Inside the capture group \(...\). I have [^]]*.
    1. The [^...] syntax means any character but.
    2. The [^]] means any character but a closing brace.
    3. The * means zero or more of the preceding. That means I am capturing zero or more characters that are not closing square braces.
  4. The \] means the closing square bracket

Let's look at the line this is [some] more [text]

  • In #1 above, I capture the first open square bracket in front of the word some. However, it's not in a capture group. This is the first character I'm going to substitute.
  • I now start a capture group. I am capturing according to 3.2 and 3.3 above, starting with the letter s in some as many characters as possible that are not closing square brackets. This means I am matching [some, but only capturing some.
  • In #4, I have ended my capture group. I've matched for substitution purposes [some and now I'm matching on the last closing square bracket. That means I'm matching [some]. Note that regular expressions are normally greedy. I'll explain below why this is important.
  • Now, I can match the replacement string. This is much easier. It's \\macro(\1). The \1 is replaced by my capture group. The \\ is just a backslash. Thus, I'll replace [some] with \macro{some}.

It would be much easier if I could be guaranteed a single set of square brackets in each line. Then I could have done this:

sed -i.bkup 's/\[\(.*\)\]/\\macro(\1)/g'

The capture group is now saying anything between to square brackets. However, the problem is that regular expressions are greedy, that means I would have matched from the s in some all the way to the final t in text. The 'x' below show the capture group. The [ and ] show the square brackets I'm matching on:

 this is [some] more [text]
         [xxxxxxxxxxxxxxxx]

This became more complex because I had to match on characters that had special meaning to regular expressions, so we see a lot of backslashing. Plus, I had to account for regular expression greediness, which got the nice looking, non-matching string [^]]* to match anything not a closing bracket. Add in the square brackets before and after \[[^]]*\], and don't forget the \(...\) capture group: \[\([^]]*\)\]And you get one big mess of a regular expression.

like image 83
David W. Avatar answered Sep 29 '22 16:09

David W.