Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preventing HTML character entities in locale files from getting munged by Rails3 xss protection

We're building an app, our first using Rails 3, and we're having to build I18n in from the outset. Being perfectionists, we want real typography to be used in our views: dashes, curled quotes, ellipses et al.

This means in our locales/xx.yml files we have two choices:

  1. Use real UTF-8 characters inline. Should work, but hard to type, and scares me due to the amount of software which still does naughty things to unicode.
  2. Use HTML character entities (’ — etc). Easier to type, and probably more compatible with misbehaving software.

I'd rather take the second option, however the auto-escaping in Rails 3 makes this problematic, as the ampersands in the YAML get auto-converted into character entities themselves, resulting in 'visible' &8217;s in the browser.

Obviously this can be worked around by using raw on strings, i.e.:

raw t('views.signup.organisation_details')

But we're not happy going down the route of globally raw-ing every time we t something as it leaves us open to making an error and producing an XSS hole.

We could selectively raw strings which we know contain character entities, but this would be hard to scale, and just feels wrong - besides, a string which contains an entity in one language may not in another.

Any suggestions on a clever rails-y way to fix this? Or are we doomed to crap typography, xss holes, hours of wasted effort or all thre?

like image 965
Chris S Avatar asked Aug 13 '10 13:08

Chris S


2 Answers

There is a ticket in lighthouse for this problem, and the resolution is to append _html to the i18n key in the locales/xx.yml file and use the t alias1 to denote an html_safe string. For example:

en:
  hello: "This is a string with an accent: ó"

becomes:

en:
  hello_html: "This is a string with an accent: ó"

And it would create the following output:

This is a string with an accent: ó

This would prevent you from having to write raw t('views.signup.organisation_details') and would result in a cleaner output of: t('views.signup.organisation_details_html'). And while exchanging raw for _html doesn't seem like the greatest of trades, it does make things clear that you're outputting what is assumed to be an html_safe string.


1 I've tested the code suggested in the lighthouse ticket. What I found was that you had to specifically use the t alias. If you used I18n.t or I18n.translate the translation didn't treat _html as html_safe:
I18n.t('hello_html') 
I18n.translate('hello_html') 
# Produces => "This is a string with an accent: ó"

t('hello_html')      
# Produces => "This is a string with an accent: ó"

I don't think this is the intended behavior per the RoR TranslationHelper documentation.

like image 100
Gavin Miller Avatar answered Nov 08 '22 12:11

Gavin Miller


Well. I bookmarked this question yesterday because of the i18n angle, but didn't answer it as I'm a Python person who's never used Rails. I'm still not going to answer it, but given you aren't being overrun by helpful Railsians who could point you at a good way of getting around Rails' innards, here's my perspective nonetheless.

First of all I think it's great that you're thinking about the problem from the outset. That's pretty rare. Second, I completely agree that using raw strings or selectively picking strings with entities to give a special treatment to sounds like a brittle, ugly, bug-prone hack.

Now if I understand Rails correctly (I read this i18n guide), the YAML files contain the localised string for each language. In this case, I'd strongly recommend to use regular characters in them (in UTF-8). Otherwise, maintaining localizations, or even reading through a translation file -- think of languages in non-Latin scripts! -- is going to be hell.

Yeah, it would mean you have to figure out input methods, but the solution is clean and straightforward.

like image 8
chryss Avatar answered Nov 08 '22 11:11

chryss