Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed rare-delimiter (other than & | / ?...)

Tags:

sed

delimiter

I am using the Unix sed command on a string that can contain all types of characters (&, |, !, /, ?, etc).

Is there a complex delimiter (with two characters?) that can fix the error:

sed: -e expression #1, char 22: unknown option to `s'
like image 795
Unitech Avatar asked Jan 30 '11 19:01

Unitech


People also ask

What are common delimiters?

A delimiter is one or more characters that separate text strings. Common delimiters are commas (,), semicolon (;), quotes ( ", ' ), braces ({}), pipes (|), or slashes ( / \ ). When a program stores sequential or tabular data, it delimits each item of data with a predefined character.

What are good delimiters?

Any character may be used to separate the values, but the most common delimiters are the comma, tab, and colon. The vertical bar (also referred to as pipe) and space are also sometimes used.

How do I change my sed delimiter?

An even less-known fact is that you can use a different delimiter even for patterns used in addresses, using a special syntax: # do this (ugly)... sed '/\/a\/b\/c\//{do something;}' # ...or these (better) sed '\#/a/b/c/#{do something;}' sed '\_/a/b/c/_{do something;}' sed '\%/a/b/c/%{do something;}' # etc.

What is the best delimiter for CSV?

CSV and importing into most programs is usually not a problem. Just specify TAB delimited rather than comma when you import your file. If there are commas in your data you WILL have a problem when specifying comma delimited as you are well aware.


1 Answers

The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.

At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:

$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'

In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).

Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.

like image 182
thkala Avatar answered Oct 01 '22 21:10

thkala