How to clean a codebase, trailing whitespace, new lines etc

Tags:

removing-whitespace

I have a code base that is driving me nuts with conflicts due to trailing whitespace. I'd like to clean it up.

I'd want to:

Remove all trailing whitespace
Remove any newline characters at the end of files
Convert all line endings to unix (dos2unix)
Convert all leading spaces to tabs, ie 4 spaces to tabs.
While ignoring the .git directory.

I'm on OSX Snow Leopard, and in zsh.

so far, i have:

sed -i "" 's/[ \t]*$//' **/*(.)

which works great, but sed adds a new line to the end of every file it touches, which is no good. I dont think sed can be stopped from doing this, so how can i remove these new lines? Theres probably some awk magic to be applied here.

(Complete answers also welcome)

635

asked Feb 16 '11 01:02

jhogendorn

1 Answers

[EDIT: Fixed whitespace trimming]
[EDIT #2: Strip trailing blank lines from end of file]

perl -i.bak -pe 'if (defined $x && /\S/) { print $x; $x = ""; } $x .= "\n" x chomp; s/\s*?$//; 1 while s/^(\t*)    /$1\t/; if (eof) { print "\n"; $x = ""; }' **/*(.)

This strips trailing blank lines from the file, but leaves exactly one \n at the end of the file. Most tools expect this, and it will not show up as a blank line in most editors. However if you do want to strip that very last \n, just delete the print "\n"; part from the command.

The command works by "saving up" \n characters until a line containing a non-blank character is seen -- then it prints them all before processing that line.

Remove .bak to avoid creating backups of the original files (use at your own risk!)

\s*? matches zero or more whitespace characters non-greedily, including \r, which is the first character of the \r\n DOS line-break syntax. In Perl, $ matches either at the end of the line, or immediately before a final \n, so combined with the fact that *? matches non-greedily (trying a 0-width match first, then a 1-width match and so on) it does the right thing.

1 while s/^(\t*) /$1\t/ is just a loop that repeatedly replaces any lines beginning with any number of tabs followed by 4 spaces with one more tab than there was, until this is no longer possible. So it will work even if some lines have been partially converted to tabs already, provided all \t characters start at a column divisible by 4.

I haven't seen the **/*(.) syntax before, presumably that's a zsh extension? If it worked with sed, it will work with perl.

181

answered Oct 01 '22 07:10

j_random_hacker

Related questions
                            
                                How to strip whitespace in string in TCL?
                            
                                Removing white space in a table
                            
                                Powershell Parsing Help - How to output a list of folder names into a text file
                            
                                Splitting string and removing whitespace Python
                            
                                how to remove white space in justified css
                            
                                How do you handle white space in your HTML [closed]
                            
                                Regular expression any character but a white space
                            
                                CSS ">" vs " > "?
                            
                                When is white space really important in Perl6 grammars?
                            
                                Regex to remove all whitescape except one between words?
                            
                                Javascript Removing Whitespace When It Shouldn't?
                            
                                I can't remove whitespaces from a string parsed by Nokogiri
                            
                                Using &nbsp; in Visual Studio 10 ASP.NET MVC3
                            
                                PHP: remove extra space from a string using regex
                            
                                Skip whitespaces with getline
                            
                                remove a space from a perl variable
                            
                                How to remove whitespace from XElement object created from XElement.ReadFrom(XmlReader)
                            
                                GIT complains about randomly changed files when switching branches
                            
                                PHP Tidy removes whitespace and inserts newlines
                            
                                How to minimize the white space created by the guide_area() function of the patchwork package in plots made with ggplot2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With