\s does not seem to work with
sed 's/[\s]\+//' tempfile
while it is working for
sed 's/[ ]\+//' tempfile
I am trying to remove white spaces that are coming at the beginning of each line due to the command:
nl -s ') ' file > tempfile
e.g. file:
A Storm of Swords, George R. R. Martin, 1216
The Two Towers, J. R. R. Tolkien, 352
The Alchemist, Paulo Coelho, 197
The Fellowship of the Ring, J. R. R. Tolkien, 432
The Pilgrimage, Paulo Coelho, 288
A Game of Thrones, George R. R. Martin, 864
tempfile:
1) Storm of Sword, George R. R. Martin, 1216
2) The Two Tower, J. R. R. Tolkien, 352
3) The Alchemit, Paulo Coelho, 197
4) The Fellowhip of the Ring, J. R. R. Tolkien, 432
5) The Pilgrimage, Paulo Coelho, 288
6) A Game of Throne, George R. R. Martin, 864
i.e. there are spaces before numbers
Please explain why the white spaces are coming and the reason for \s to not work.
The reason is simple: POSIX regex engine does not parse shorthand Perl-like character classes as such inside bracket expressions.
See this reference:
One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression. So in POSIX, the regular expression
[\d]matches a\or ad.
So, [\s] in a POSIX regex matches one of two symbols: either \ or s.
Consider the following demo:
echo 'ab\sc' | sed 's/[\s]\+//'
Output is abc. \s substring is removed.
Consider using POSIX character classes instead of Perl-like shorthands:
echo 'ab\s c' | sed 's/[[:space:]]\+//'
See this online demo (the output is ab\sc). The POSIX character classes are made of [:<NAME_OF_CLASS>:], and they can only be used inside bracket expressions. See more examples of POSIX character classes here.
NOTE: if you want to make sure the spaces at the start of the line are removed, add ^ at the pattern start:
sed 's/^[[:space:]]\+//'
^
MORE PATTERNS:
\w = [[:alnum:]_]\W = [^[:alnum:]_]\d = [[:digit:]] (or [0-9])\D = [^[:digit:]] (or [^0-9])\h = [[:blank:]]\S = [^[:space:]]You could also format the numbers without fixed width. From coreutils.info:
‘-w NUMBER’
‘--number-width=NUMBER’
Use NUMBER characters for line numbers (default 6).
E.g.:
nl -w 1 -s ') ' infile
Output:
1) A Storm of Swords, George R. R. Martin, 1216
2) The Two Towers, J. R. R. Tolkien, 352
3) The Alchemist, Paulo Coelho, 197
4) The Fellowship of the Ring, J. R. R. Tolkien, 432
5) The Pilgrimage, Paulo Coelho, 288
6) A Game of Thrones, George R. R. Martin, 864
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With