Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why ASP.NET Core convert Persian(or Arabic) text to Character reference (&#xhhhh;) in view

The source code:

@{ ViewBag.Title = "سلام علیک"; }

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>@ViewBag.Title</title>
</head>
<body>

    <div class="container" dir="rtl">
        @RenderBody()
    </div>

</body>
</html>

It's well rendered in browser but I want the same text in html source (for some search engine optimizer software)

ViewBag problem in Arabic text

And the output:

<!DOCTYPE html>
<html>
<head>
    <title>&#x633;&#x644;&#x627;&#x645; &#x639;&#x644;&#x6CC;&#x6A9;</title>
</head>
<body>
...
</body>
</html>
like image 718
Soren Avatar asked Oct 25 '16 06:10

Soren


1 Answers

Because, by default, the HTML encoding engine will only safelist the basic latin alphabet (because browsers have bugs. So we're trying to protect against unknown problems). The &XXX values you see still render as correctly as you can see in your screen shots, so there's no real harm, aside from the increased page size.

If the increased page size bothers you then you can customise the encoder to safe list your own character pages (not language, Unicode doesn't think in terms on language)

To widen the characters treated as safe by the encoder you would insert the following line into the ConfigureServices() method in startup.cs;

services.AddSingleton<HtmlEncoder>( HtmlEncoder.Create(allowedRanges: new[] { UnicodeRanges.BasicLatin, UnicodeRanges.Arabic }));

Arabic has quite a few blocks in Unicode, so you may need to add more blocks to get the full range you need.

like image 135
blowdart Avatar answered Nov 04 '22 02:11

blowdart