Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to restrict sed to replace only data appearing after the first closing square bracket?

Tags:

csv

sed

I have a CSV file which uses a highly customized format. Here, each number represents a the data in each of the 4 columns:

1 2 [3] 4

I need to restrict sed to only search and modify data appearing in the fourth column. Essentially, it must ignore all data on the line appearing before the first occurrence of a closing square bracket and space, ] and only modify data appearing after. E.g., file1.txt might contain this:

penguin bird [lives in Antarctica] The penguin lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins.

The replacement might be sed 's/penguin/animal/g' file1.txt. After running the script, the output would look like this:

penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat animal.

In this case, all appearances of penguin were ignored prior to the first ] and were only changed on lines appearing after.

  • Additional closing brackets might appear later in the line, but only the first should be regarded as the division.

How can I have sed ignore the first three columns of this custom CSV format while it finds and replaces text?

I have GNU sed version 4.2.1.

like image 974
Village Avatar asked Feb 19 '23 03:02

Village


1 Answers

You tell sed to search for the '] ' combination followed by .* (anything), and then as part of your replacement, you put back the ] chars.

The only problem is that sed usually "thinks" that a ] char is part of a character-class definition, so you have to escape it. Try

echo "a b [c] d" | sed 's/\] .*$/\] XYZ/'
a b [c] XYZ

Note, that because there was no opening [ char to indicate char-class def, you can get away with

echo "a b [c] d" | sed 's/] .*$/] XYZ/'
a b [c] XYZ

Edit

To fix just the 4th word,

echo "a b [c] d e" | sed 's/\] [^ ][^ ]*/\] XYZ/'
a b [c] XYZ e 

The addition from above [^ ][^ ]/ says "any-char-that-is-not-a-space" followed by any number of "any-char-that-is-not-a-space", so when the matcher finds the next space is stops matching.

final edit

echo "penguin bird [lives in Antarctica] The penguin lives in cold places.
wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \
| sed 's/\] The penguin \(.*$\)/] The animal \1/'

and as you're using gnu sed, you don't need to escape the (...) capturing parens.

echo "penguin bird [lives in Antarctica] The penguin lives in cold places.
wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \
| sed 's/\] The penguin (*$)/] The animal \1/'

output

penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins.

Depending on the version of sed you are using. There is a pretty large difference bewtween sed for the AIX, vs solaris, VS the GNU seds usually found in a lunix.

If you have other questions about using sed, it is usually helpful to include the output of sed --version, or sed -V. If no response from those commands, try what sed. Else include the OS name for uname.

IHTH

like image 76
shellter Avatar answered May 18 '23 20:05

shellter