Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a working strategy to serve umlauts generated by JS and CSS to all possible host page encodings

I'm serving an embeddable <script> that users can copy/paste into their websites and have content displayed.

The script loads a stylesheet and renders some HTML that is injected into the host page.

I'm facing problems displaying special characters (ü,ö,ä, you name it) when the host pages are encoded in encodings different from my script (which is encoded in UTF-8) like ISO-8559-1. Special characters will be garbled.

Content is injected like:

var content = template.render(model);
$('#some-el').html(content);

The same problem goes for content that is generated via CSS pseudos like:

.some-class::after{
  content: 'Ümläüts äré fün';
}

My solution to the problem right now is converting all Umlauts into entities (&uuml; for HTML, \00FC for CSS) when precompiling my templates (Mustache that is compiled via hogan.js) and CSS in the build step. This is working, but feels very cumbersome and easy to break.

What are the factors in play that determine the encoding of content generated by JavaScript? Is there a way to have the host site "respect" my script output's encoding? Might this be due to some server misconfiguration?

like image 971
m90 Avatar asked Nov 02 '22 08:11

m90


1 Answers

I am not quite sure why you feel escaping is cumbersome ...

For HTML you can escape all characters with codes greater than 127 (pseudocode):

uint code = ...
if( code < ' '|| code > 127 ) {  
  print("&#"); 
  print(toString(code)); 
  print(";"); 
} else {
  print(code); 
}

This will escape all non-ascii characters.

And pretty much the same is for CSS. Such symbols in CSS can appear only in string literals or comments so you can simply escape all non-ascii's in CSS files without parsing CSS structure.

All this is quite reliable I think.

like image 182
c-smile Avatar answered Nov 15 '22 05:11

c-smile