Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF8 Script in PowerShell outputs incorrect characters

I've created a UTF8 script for PowerShell with non-ascii characters.

characters.ps1:

Write-Host "ç â ã á à" 

When the script is run in PowerShell console, it outputs wrong characters.

enter image description here

However, if I write the chars directly in the console, they are shown as expected:

enter image description here

Does anyone knows what causes that behavior?

The problem arised from a script I wrote who has hardcoded paths which include non-ascii characters. When I try to pass the path as argument to cmdlets (in the case I was gonna robocopy a folder) the command fails because it cannot find the path (which is output wrongly in the screen).

like image 911
Arthur Nunes Avatar asked Jan 23 '13 14:01

Arthur Nunes


People also ask

What characters are not allowed in UTF-8?

0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units. A UTF-8 code unit is 8 bits. If by char you mean an 8-bit byte, then the invalid UTF-8 code units would be char values that do not appear in UTF-8 encoded text.

Can UTF-8 encode all characters?

UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes.

Can PowerShell display Unicode?

Windows supports Unicode and traditional character sets. Traditional character sets, such as Windows code pages, use 8-bit values or combinations of 8-bit values to represent the characters used in a specific language or geographical region settings. PowerShell uses a Unicode character set by default.


1 Answers

Changing the encoding of the script to UTF-8 with BOM solved the issue.

I was using SublimeText with the EncodingHelper plugin to control the character-set of the script. It was set correctly to UTF8.

I changed the encoding of the script in SublimeText to "UTF-8 with BOM" and the output was shown correctly.

I created the same script with Notepad++, which defaults to "UTF-8 with BOM", and the string was shown correctly in the console.

I changed the encoding of the script in Notepad++ to "UTF-8 without BOM" and it was shown incorrectly.

It seems PowerShell cannot guess correctly the encoding of UTF-8 files with no BOM.

like image 197
Arthur Nunes Avatar answered Oct 07 '22 17:10

Arthur Nunes