I'm using awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt > out.txt
to remove both leading and trailing whitespaces.
The problem is the output file actually has trailing whitespaces! All lines are of the same length - they are right padded with spaces.
What am I missing?
UPDATE 1
The problem is probably due to the the fact that the trailing spaces are nor "normal" spaces but \x20 characters (DC4).
UPDATE 2
I used gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"")
an it worked.
Two strange things:
Why isn't \x20 considered a control character?
Using '[[:cntrl:][:space:]\x20
does NOT work. Why?
Use the Trim() method to remove leading and trailing whitespace from a string.
strip() Python String strip() function will remove leading and trailing whitespaces. If you want to remove only leading or trailing spaces, use lstrip() or rstrip() function instead.
JavaScript String trim() The trim() method removes whitespace from both sides of a string. The trim() method does not change the original string.
The replaceAll() method of the String class replaces each substring of this string that matches the given regular expression with the given replacement. You can remove white spaces from a string by replacing " " with "".
This command works for me:
$ awk '{$1=$1}1' file.txt
Your code is OK for me.
You may have something else than space
and tabulation
...hexdump -C
may help you to check what is wrong:
awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less
OK you identified DC4 (there may be some other control characters...)
Then, you can improve your command:
awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt
See awk
manpage:
[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space or tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not control characters.)
[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed, to name a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.
0x20
removalFor me the command is OK, I have tested like this:
$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000 20 20 09 54 45 58 54 20 20 09 0a | .TEXT ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000 54 45 58 54 0a |TEXT.|
00000005
However if you have 0x20
in the middle of your text
=> then it is not removed.
But this is not your question, isn't it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With