I am trying to vocab list for a Greek text we are translating in class. I want to replace every space or tab character with a paragraph mark so that every word appears on its own line. Can anyone give me the sed command, and explain what it is that I'm doing? I’m still trying to figure sed out.
Remove All Line Breaks from a String We can remove all line breaks by using a regex to match all the line breaks by writing: str = str. replace(/(\r\n|\n|\r)/gm, ""); \r\n is the CRLF line break used by Windows.
For reasonably modern versions of sed, edit the standard input to yield the standard output with
$ echo 'τέχνη βιβλίο γη κήπος' | sed -E -e 's/[[:blank:]]+/\n/g' τέχνη βιβλίο γη κήπος
If your vocabulary words are in files named lesson1
and lesson2
, redirect sed’s standard output to the file all-vocab
with
sed -E -e 's/[[:blank:]]+/\n/g' lesson1 lesson2 > all-vocab
What it means:
[[:blank:]]
matches either a single space character or a single tab character. [[:space:]]
instead to match any single whitespace character (commonly space, tab, newline, carriage return, form-feed, and vertical tab).+
quantifier means match one or more of the previous pattern.[[:blank:]]+
is a sequence of one or more characters that are all space or tab.\n
in the replacement is the newline that you want./g
modifier on the end means perform the substitution as many times as possible rather than just once.-E
option tells sed to use POSIX extended regex syntax and in particular for this case the +
quantifier. Without -E
, your sed command becomes sed -e 's/[[:blank:]]\+/\n/g'
. (Note the use of \+
rather than simple +
.)For those familiar with Perl-compatible regexes and a PCRE-capable sed, use \s+
to match runs of at least one whitespace character, as in
sed -E -e 's/\s+/\n/g' old > new
or
sed -e 's/\s\+/\n/g' old > new
These commands read input from the file old
and write the result to a file named new
in the current directory.
Going back to almost any version of sed since Version 7 Unix, the command invocation is a bit more baroque.
$ echo 'τέχνη βιβλίο γη κήπος' | sed -e 's/[ \t][ \t]*/\ /g' τέχνη βιβλίο γη κήπος
Notes:
+
quantifier and simulate it with a single space-or-tab ([ \t]
) followed by zero or more of them ([ \t]*
).\n
for newline, we have to include it on the command line verbatim. \
and the end of the first line of the command is a continuation marker that escapes the immediately following newline, and the remainder of the command is on the next line. The commands above all used single quotes (''
) rather than double quotes (""
). Consider:
$ echo '\\\\' "\\\\" \\\\ \\
That is, the shell applies different escaping rules to single-quoted strings as compared with double-quoted strings. You typically want to protect all the backslashes common in regexes with single quotes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With