Create a file utf8.txt
. Ensure the encoding is UTF-8 (no BOM). Set its content to €
In cmd.exe
:
type utf8.txt > out.txt
Content of out.txt
is €
In PowerShell (v4):
cat .\utf8.txt > out.txt
or
type .\utf8.txt > out.txt
Out.txt content is €
How do I globally make PowerShell work correctly?
There is no official difference between UTF-8 and BOM-ed UTF-8. A BOM-ed UTF-8 string will start with the three following bytes. EF BB BF. Those bytes, if present, must be ignored when extracting the string from the file/stream.
Note: This answer is about Windows PowerShell (up to v5.1); PowerShell [Core, v6+], the cross-platform edition of PowerShell, now fortunately defaults to BOM-less UTF-8 on both in- and output.
Windows PowerShell, unlike the underlying .NET Framework[1] , uses the following defaults:
on input: files without a BOM (byte-order mark) are assumed to be in the system's default encoding, which is the legacy Windows code page ("ANSI" code page: the active, culture-specific single-byte encoding, as configured via Control Panel).
on output: the >
and >>
redirection operators produce UTF-16 LE files by default (which do have - and need - a BOM).
File-consuming and -producing cmdlets do usually support an -Encoding
parameter that lets you specify the encoding explicitly.
Prior to Windows PowerShell v5.1, using the underlying Out-File
cmdlet explicitly was the only way to change the encoding.
In Windows PowerShell v5.1+, >
and >>
became effective aliases of Out-File
, allowing you to change the encoding behavior of >
and >>
via the $PSDefaultParameterValues
preference variable; e.g.:$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
.
For Windows PowerShell to handle UTF-8 properly, you must specify it as both the input and output encoding[2] , but note that on output, PowerShell invariably adds a BOM to UTF-8 files.
Applied to your example:
Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt
To create a UTF-8 file without a BOM in PowerShell, see this answer of mine.
[1] .NET Framework uses (BOM-less) UTF-8 by default, both for in- and output.
This - intentional - difference in behavior between Windows PowerShell and the framework it is built on is unusual. The difference went away in PowerShell [Core] v6+: both .NET [Core] and PowerShell [Core] default to BOM-less UTF-8.
[2] Get-Content
does, however, automatically recognize UTF-8 files with a BOM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With