I've used a good few programming languages over the years and I'm an armchair linguist and contributor to Wiktionary. I've been making some of my own tools to look up Wiktionary from the commandline but I've run into a surprising problem.
Neither Perl nor Python can output Unicode to the console natively under both *nix and Windows (though there are various workarounds). The main reason is that *nix OSes like their Unicode in UTF-8 and Windows likes its Unicode in UTF-16. But it also seems that Windows makes it very difficult to use wide characters with the console even though both the console and wprintf are wide character native.
So the question is, is the situation any better if I look beyond these languages into Java, C#, Scala, etc. Or are there any scripting languages which started out on Windows and were then ported to *nix?
Here is some ideal pseudocode:
function main()
{
    print( L"hello, 世界" );
}
Does any language do Unicode and cross-platform properly and fully?
C# supports Unicode very extensively. Its standard library (.NET Framework) also has outstanding support for Unicode. Cross-platform is reasonable, but not perfect: it's achieved via Mono, and on mobile platforms via Xamarin.
Command-line programs are pretty portable but can get screwed by ancient relics, like SSH terminals that haven't been updated for a decade or more.
Here is some ideal pseudocode:
C# gets pretty close:
using System;
class Program
{
    static void Main(string[] args)
    {
        Console.OutputEncoding = System.Text.Encoding.UTF8;
        Console.WriteLine("tést, тест, τεστ, ←↑→↓∏∑√∞①②③④, Bài viết chọn lọc");
    }
}
Screenshot of the output (use Consolas or another font that has all the above characters):

Of course C# is not a scripting language; it is quite different in its approach to pretty much everything.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With