Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove leading and trailing whitespaces?

I'm using awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt > out.txt to remove both leading and trailing whitespaces.

The problem is the output file actually has trailing whitespaces! All lines are of the same length - they are right padded with spaces.

What am I missing?

UPDATE 1

The problem is probably due to the the fact that the trailing spaces are nor "normal" spaces but \x20 characters (DC4).

UPDATE 2

I used gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"") an it worked. Two strange things:

  1. Why isn't \x20 considered a control character?

  2. Using '[[:cntrl:][:space:]\x20 does NOT work. Why?

like image 291
user1194552 Avatar asked Feb 07 '12 11:02

user1194552


People also ask

Which method is used to remove leading and trailing?

Use the Trim() method to remove leading and trailing whitespace from a string.

How do you remove leading and trailing whitespaces from a string in Python?

strip() Python String strip() function will remove leading and trailing whitespaces. If you want to remove only leading or trailing spaces, use lstrip() or rstrip() function instead.

How do you remove leading and trailing whitespaces from a string in JS?

JavaScript String trim() The trim() method removes whitespace from both sides of a string. The trim() method does not change the original string.

How do I get rid of whitespaces?

The replaceAll() method of the String class replaces each substring of this string that matches the given regular expression with the given replacement. You can remove white spaces from a string by replacing " " with "".


2 Answers

This command works for me:

$ awk '{$1=$1}1' file.txt
like image 193
kev Avatar answered Sep 17 '22 14:09

kev


Your code is OK for me.
You may have something else than space and tabulation...
hexdump -C may help you to check what is wrong:

awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less

UPDATE:

OK you identified DC4 (there may be some other control characters...)
Then, you can improve your command:

awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt

See awk manpage:

[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space or tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not control characters.)
[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed, to name a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.

Leading/Trailing 0x20 removal

For me the command is OK, I have tested like this:

$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000  20 20 09 54 45 58 54 20  20 09 0a                 |  .TEXT  ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000  54 45 58 54 0a                                    |TEXT.|
00000005

However if you have 0x20 in the middle of your text
=> then it is not removed.
But this is not your question, isn't it?

like image 40
oHo Avatar answered Sep 16 '22 14:09

oHo