Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse HTML for minification in PHP?

I'm looking to write an algorithm to compress HTML output for a CMS I'm writing in PHP, written with the CodeIgniter framework.

I was thinking of trying to remove whitespace between any angle brackets, except the <script>, <pre>, and <style> elements, and simply ignoring those elements for simplicity. I should clarify that this is whitespace between consecutive tags, with no text between them.

How should I go about parsing the HTML to find the whitespace I want to remove?

Edit: To start off, I want to remove all tab characters that are not in <pre> tags. This can be done with regex, I'm sure, but what are the alternatives?

like image 995
timw4mail Avatar asked Jun 30 '10 15:06

timw4mail


2 Answers

Don't. Whitespace is negligible. Better to be using output compression, with zlib or here for example

like image 145
Pete Avatar answered Nov 01 '22 12:11

Pete


Is there something wrong with the existing HTML minification solutions?

Minify does HTML (as well as CSS and JS).

(That second link goes to the source code, which comments the steps it takes - should be a good leg up if you did want to create your own - it's BSD licensed.)

Also, as Pete says, you'll benefit much more by using gzip compression for your HTML (and CSS/JS/etc), and wont get tripped up by problems such as Gordon mentioned in his comment.

like image 43
Peter Boughton Avatar answered Nov 01 '22 11:11

Peter Boughton