Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How best to sanitize rich html with rails?

I'm looking for advice on how to clean submitted html in a web app so it can be redisplayed in future with out styles or unclosed tags wrecking the layout of an app.

On my app rich HTML is submitted by users with YUI Rich text editor, which by default runs a few regexps to clean the input, and I'm also calling the [filter_MSWord][1] to catch any crap sent in from office

On the back end, I'm running ruby-tidy to to sanitize the html before being displayed as comments, but on occasion badly pasted html still affect the layout of the app I'm using - how can I safeguard against this?

FWIW here are the sanitizer settings I'm using -

module HTMLSanitizer


  def tidy_html(input)

    cleaned_html = Tidy.open(:show_warnings=>false) do |tidy|
      # don’t output body and html tags
      tidy.options.show_body_only = true 
      # output xhtml
      tidy.options.output_html = true
      # don’t write newlines all over the place
      tidy.options.wrap = 0
      # use utf8 to play nice with rails
      tidy.options.char_encoding = 'utf8'
      xml = tidy.clean(input)
      xml
    end
  end

end

What else are my options here?

like image 316
Chris Adams Avatar asked Dec 29 '22 19:12

Chris Adams


1 Answers

I personally use the sanitize gem.

require 'sanitize'
op = Sanitize.clean("<html><body>wow!</body></hhhh>") # Notice the incorrect HTML. It still outputs "wow!"
like image 148
Sinan Taifour Avatar answered Jan 12 '23 02:01

Sinan Taifour