Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl6 (Rakudo) - How to handle special characters from file?

How can I read special characters from a external file ? Here a simple .txt file in French, which content is the first paragraph of https://fr.lipsum.com/ : as you can see on my screenshot, the file encoding is UTF-8 but the accents are not displayed correctly.

I tried various encodings within notepad++ and in my perl6 script, like these :

enc => "utf8"
enc => "latin1"

With Python or Ruby scripts I don't encounter the problem. I can't found any precise example about that matter, probably because perl 6 is still quite recent (??). Thank you.

My script as it is displayed in the screenshot :

my $text_contents = slurp "testfile.txt", enc => "utf8";
say $text_contents;
prompt;

Perl6 script, input file in notepad++, exec in cmd.exe


Final edit : the solution is to enable an option, available in beta state with Windows 10 1803, to make the OS handle unicode characters properly : see answers and comments below ...

like image 576
Frenzowski Avatar asked Mar 15 '19 09:03

Frenzowski


1 Answers

If you're not using Windows

This SO is either entirely or almost entirely irrelevant to you.

If you're using Windows 10

Check the "Beta: Use Unicode UTF-8 for worldwide language support" option checkbox.

At least at the time I originally wrote this answer, text near this Unicode related checkbox claimed it's for programs that do not support Unicode, but you should just ignore that.[1]

At the time I originally wrote this answer the checkbox was found under control panel, "Region" entry, "Administrative" tab, "Change system locale" button.

Microsoft may have changed this stuff since I wrote this answer, and may change it again, eg by moving and/or renaming the checkbox, or making things more involved than just clicking a single checkbox.

Per their comment below this answer, the OP notes:

For those who are interested in that particular option, it can be found in the "legacy" Control panel of windows -> Region -> Administrative -> Edit settings...

If you're using an older version of Windows

Arguably, the good news is that Raku and Rakudo have some of the world's best modern support for Unicode, and the OK news is that it relies on Microsoft correctly supporting Unicode, which they're now trying to do.

The bad news is that they made a lot of mistakes in older versions of Windows (and even in Windows 10, which they're now trying to fix), so any solution will be constrained by those mistakes. (Perhaps the biggest problem is Microsoft's doublespeak on the topic[1], but let's hope we can work around that.)

That all said, please read the following and then either return to searching for solutions or post a fresh SO question and we'll try to help.


Quoting Wikipedia's page Unicode in Microsoft Windows:

they are still in 2018 improving their operating system support for UTF-8

Microsoft got off on the wrong foot with their Unicode support last century. The good news is that they have at last begun digging their way out of the hole they dug for themselves and everyone else.

But they're definitely not there yet -- not at the time of originally writing this answer, and, I suspect not for another N years -- at least inasmuch as things don't work correctly out of the box for many end users. I think this is the root of most problems with Unicode on Windows.

Older languages like Python, Ruby and Perl came up with a range of hacks that hid the many problems with Microsoft's older UTF8 support from most users in simple scenarios by using what Microsoft ironically described as "Unicode support".

This has always come with the trade-off that things get very hairy or even completely unworkable for more complex applications in many locales around the world. (So much so that even the mighty Microsoft finally capitulated in 2018.)

In essence, until this new Microsoft effort to get with the program, software that ran on Windows has had no alternative but to either use their fundamentally broken "Unicode support" or to actually support Unicode properly.[1]

Raku and Rakudo focused on the latter, and problems with it when run on Windows are related to this conflicting with Microsoft's old broken approach. Fortunately Microsoft is now getting with the program and so we may be able find a way to get around problems you have with Unicode on Windows provided you are patient.

In particular, if you are using an older Windows version, please expect it to not work at first with modern Unicode aware software unless you are lucky. We'll still help if we can, but it'll likely involve you being patient with us and Microsoft and Rakudo and vice-versa.

Footnotes

[1] At the time I originally wrote this answer, there is text near the checkbox that it's for programs that do not support Unicode. This is entirely the opposite of what's really going on, but hey, it's Microsoft.

like image 133
raiph Avatar answered Nov 17 '22 19:11

raiph