Download and install this powerful free text editor: Notepad++ Open the file you want to verify/fix in Notepad++ In the top menu select Encoding > Convert to UTF-8 (option without BOM) Save the file.
In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoding text, however, either as a by-product of an encoding conversion or because it was added by an editor.
The UTF-8 encoding without a BOM has the property that a document which contains only characters from the US-ASCII range is encoded byte-for-byte the same way as the same document encoded using the US-ASCII encoding. Such a document can be processed and understood when encoded either as UTF-8 or as US-ASCII.
Using .NET's UTF8Encoding
class and passing $False
to the constructor seems to work:
$MyRawString = Get-Content -Raw $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyRawString, $Utf8NoBomEncoding)
The proper way as of now is to use a solution recommended by @Roman Kuzmin in comments to @M. Dudley answer:
[IO.File]::WriteAllLines($filename, $content)
(I've also shortened it a bit by stripping unnecessary System
namespace clarification - it will be substituted automatically by default.)
I figured this wouldn't be UTF, but I just found a pretty simple solution that seems to work...
Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext
For me this results in a utf-8 without bom file regardless of the source format.
Note: This answer applies to Windows PowerShell; by contrast, in the cross-platform PowerShell Core edition (v6+), UTF-8 without BOM is the default encoding, across all cmdlets.
In other words: If you're using PowerShell [Core] version 6 or higher, you get BOM-less UTF-8 files by default (which you can also explicitly request with -Encoding utf8
/ -Encoding utf8NoBOM
, whereas you get with-BOM encoding with -utf8BOM
).
If you're running Windows 10 and you're willing to switch to BOM-less UTF-8 encoding system-wide - which can have side effects - even Windows PowerShell can be made to use BOM-less UTF-8 consistently - see this answer.
To complement M. Dudley's own simple and pragmatic answer (and ForNeVeR's more concise reformulation):
For convenience, here's advanced function Out-FileUtf8NoBom
, a pipeline-based alternative that mimics Out-File
, which means:
Out-File
in a pipeline.Out-File
.-UseLF
switch allows you transform Windows-style CRLF newlines to Unix-style LF-only newlines.Example:
(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath # Add -UseLF for Unix newlines
Note how (Get-Content $MyPath)
is enclosed in (...)
, which ensures that the entire file is opened, read in full, and closed before sending the result through the pipeline. This is necessary in order to be able to write back to the same file (update it in place).
Generally, though, this technique is not advisable for 2 reasons: (a) the whole file must fit into memory and (b) if the command is interrupted, data will be lost.
A note on memory use:
Source code of function Out-FileUtf8NoBom
:
Note: The function is also available as an MIT-licensed Gist, and only it will be maintained going forward.
You can install it directly with the following command (while I can personally assure you that doing so is safe, you should always check the content of a script before directly executing it this way):
# Download and define the function.
irm https://gist.github.com/mklement0/8689b9b5123a9ba11df7214f82a673be/raw/Out-FileUtf8NoBom.ps1 | iex
function Out-FileUtf8NoBom {
<#
.SYNOPSIS
Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).
.DESCRIPTION
Mimics the most important aspects of Out-File:
* Input objects are sent to Out-String first.
* -Append allows you to append to an existing file, -NoClobber prevents
overwriting of an existing file.
* -Width allows you to specify the line width for the text representations
of input objects that aren't strings.
However, it is not a complete implementation of all Out-File parameters:
* Only a literal output path is supported, and only as a parameter.
* -Force is not supported.
* Conversely, an extra -UseLF switch is supported for using LF-only newlines.
Caveat: *All* pipeline input is buffered before writing output starts,
but the string representations are generated and written to the target
file one by one.
.NOTES
The raison d'être for this advanced function is that Windows PowerShell
lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8
invariably prepends a BOM.
Copyright (c) 2017, 2020 Michael Klement <[email protected]> (http://same2u.net),
released under the [MIT license](https://spdx.org/licenses/MIT#licenseText).
#>
[CmdletBinding()]
param(
[Parameter(Mandatory, Position=0)] [string] $LiteralPath,
[switch] $Append,
[switch] $NoClobber,
[AllowNull()] [int] $Width,
[switch] $UseLF,
[Parameter(ValueFromPipeline)] $InputObject
)
#requires -version 3
# Convert the input path to a full one, since .NET's working dir. usually
# differs from PowerShell's.
$dir = Split-Path -LiteralPath $LiteralPath
if ($dir) { $dir = Convert-Path -ErrorAction Stop -LiteralPath $dir } else { $dir = $pwd.ProviderPath}
$LiteralPath = [IO.Path]::Combine($dir, [IO.Path]::GetFileName($LiteralPath))
# If -NoClobber was specified, throw an exception if the target file already
# exists.
if ($NoClobber -and (Test-Path $LiteralPath)) {
Throw [IO.IOException] "The file '$LiteralPath' already exists."
}
# Create a StreamWriter object.
# Note that we take advantage of the fact that the StreamWriter class by default:
# - uses UTF-8 encoding
# - without a BOM.
$sw = New-Object System.IO.StreamWriter $LiteralPath, $Append
$htOutStringArgs = @{}
if ($Width) {
$htOutStringArgs += @{ Width = $Width }
}
# Note: By not using begin / process / end blocks, we're effectively running
# in the end block, which means that all pipeline input has already
# been collected in automatic variable $Input.
# We must use this approach, because using | Out-String individually
# in each iteration of a process block would format each input object
# with an indvidual header.
try {
$Input | Out-String -Stream @htOutStringArgs | % {
if ($UseLf) {
$sw.Write($_ + "`n")
}
else {
$sw.WriteLine($_)
}
}
} finally {
$sw.Dispose()
}
}
Starting from version 6 powershell supports the UTF8NoBOM
encoding both for set-content and out-file and even uses this as default encoding.
So in the above example it should simply be like this:
$MyFile | Out-File -Encoding UTF8NoBOM $MyPath
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With