Please let me state first that this problem is strictly related to the perl diamond operator accepting input that has been directly typed on the keyboard.
Had I talked about the perl diamond operator accepting input that that has been piped or otherwise from text from a file, then yes, this would be a duplicate of question 519309 -- How do I read Utf-8 with diamond operator.
However, this is not about piped or file data, but rather about input that has been directly typed on the keyboard. Therefore, I argue, this question is not a duplicate of 519309.
Here are the details of my question:
I am trying to use umlaut characters ('ä', 'ö',' ü', ...) on my keyboard.
I have a very simple perl script that accepts a line from the keyboard and then immediately prints it out again to screen:
If I use umlaut characters with codepage 1252, then everything works as expected:
C:\>chcp 1252 & perl -CS -we"print '*** '; $txt = <>; print '--- ', $txt;"
Page de codes active : 1252
*** ü
--- ü
However, if I use the same umlaut characters with codepage 65001 (UTF-8), then I get a warning uninitialized value and the umlaut is not accepted:
C:\>chcp 65001 & perl -CS -we"print '*** '; $txt = <>; print '--- ', $txt;"
Page de codes active : 65001
*** ü
Use of uninitialized value $txt in print at -e line 1.
---
If I pipe the umlaut into my perl program, then I have no problem:
C:\>chcp 65001 & echo ü | perl -CS -we"print '*** '; $txt = <>; print '--- ', $txt;"
Page de codes active : 65001
*** --- ü
Why do I get this warning with codepage 65001 (UTF-8)?
I am using Windows 7 x64, with Strawberry Perl 5.22.
Just for the record, if I use pure batch commands (that is I don't use perl), then I can successfully key in umlaut characters with codepage 65001 (UTF-8).
C:\>chcp 65001 & set /p txt=*** & echo --- %txt%
Page de codes active : 65001
*** ü
--- ü
The question really is: Why is perl not able to accept umlaut characters by keyboard with codepage 65001, whereas the very same keyboard input, same codepage 65001, works ok as a pure dos batch command?
There seems to be something fundamently different between piping umlaut characters and typing umlaut characters directly from the keyboard.
Why is typing an umlaut character on the keyboard not working, whereas the same thing works perfectly fine as a piped character?
Try to change console font to "Lucida Console"
Also you can try to run chcp 65001
in console. This command will set characters to UTF-8
If you get wrong displaying - install required font into system.
More details here
Actually the problem does not belongs to perl. It belongs to windows terminal. Try how it works in this console . YOu can log to some file binary data that was read from input and compare those two cases (terminal VS cygwin)
This is a Microsoft bug. The Windows APIs ReadFile()
and ReadConsoleA()
always return 0 bytes read (which indicates EOF) on code page 65001. See this blog for details.
As Microsoft will not fix this, the only available answer is to tell the Perl maintainers to switch to using ReadConsoleW()
and converting the resultant wide chars to utf-8 with WideCharToMultiByte(CP_UTF8, ...)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With