So I have a string in ruby that is something like
str = "<html>\n<head>\n\n <title>My Page</title>\n\n\n</head>\n\n<body>" +
" <h1>My Page</h1>\n\n<div id=\"pageContent\">\n <p>Here is a para" +
"graph. It can contain spaces that should not be removed.\n\nBut\n" +
"line breaks that should be removed.</p></body></html>"
How would I remove all whitespace (spaces, tabs, and linebreaks) that is outside of a tag/not inside a tag that has content like <p>
using only native Ruby?
(I'd like to avoid using XSLT or something for a task this simple.)
str.gsub!(/\n\t/, " ").gsub!(/>\s*</, "><")
That first gsub!
replaces all line breaks and tabs with spaces, the second removes spaces between tags.
You will end up with multiple spaces inside your tags, but if you just removed all \n
and \t
, you would get something like "not be removed.Butline breaks", which is not very readable. Another Regular Expression or the aforementioned .squeeze(" ")
could take care of that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With