I'm looking for advice on how to clean submitted html in a web app so it can be redisplayed in future with out styles or unclosed tags wrecking the layout of an app.
On my app rich HTML is submitted by users with YUI Rich text editor, which by default runs a few regexps to clean the input, and I'm also calling the [filter_MSWord][1]
to catch any crap sent in from office
On the back end, I'm running ruby-tidy
to to sanitize the html before being displayed as comments, but on occasion badly pasted html still affect the layout of the app I'm using - how can I safeguard against this?
FWIW here are the sanitizer settings I'm using -
module HTMLSanitizer
def tidy_html(input)
cleaned_html = Tidy.open(:show_warnings=>false) do |tidy|
# don’t output body and html tags
tidy.options.show_body_only = true
# output xhtml
tidy.options.output_html = true
# don’t write newlines all over the place
tidy.options.wrap = 0
# use utf8 to play nice with rails
tidy.options.char_encoding = 'utf8'
xml = tidy.clean(input)
xml
end
end
end
What else are my options here?
I personally use the sanitize gem.
require 'sanitize'
op = Sanitize.clean("<html><body>wow!</body></hhhh>") # Notice the incorrect HTML. It still outputs "wow!"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With