Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell - ASCII encoding is changing special characters to question marks

I'm using a Powershell script as follows to convert a string to XML then export to a file (done this way to keep indenting):

[xml]$xmloutput = $xml
$sw = New-Object System.IO.StringWriter
$writer = New-Object System.Xml.XmlTextWriter($sw)
$writer.Formatting = [System.Xml.Formatting]::Indented
$xmloutput.WriteContentTo($writer)
$sw.ToString() | Set-Content -Encoding 'ASCII' $filepath

The destination has to be ASCII formatted due to a vendor restriction. The issue I'm seeing is ASCII just changes special characters into questions marks (example: Ö becomes ?).

If I use UTF8 encoding the output looks totally fine. I've even tried saving to UTF8 then converting to ASCII, does the same thing (exports a question mark):

[System.Io.File]::ReadAllText($filepath) | Out-File -FilePath $filepath -Encoding ASCII

If I try and replace the characters in the string before the conversion to XML (using ASCII code Ö) it simply converts the ampersand and leaves the rest, making it useless.

Is there any way to have Powershell correctly save those characters into the file?

EDIT: I would like to see the special character in the outputted file, but if that is not ASCII-compliant, I'd like to see the ASCII code for it (in this example, Ö)

I also don't want to see just an O, I need the actual character.

like image 811
chazbot7 Avatar asked Oct 21 '25 04:10

chazbot7


1 Answers

All characters in an XML document are Unicode. However, a representation of an XML document has a document encoding. Characters that are not members of that character set are written as character entity references, often numerically and in hexadecimal notation. The number is the Unicode codepoint.

It seems your partner's requirement is to use ASCII as the document encoding.

XmlDocument is a bit hard to work with but an XmlWriter with settings for the document encoding will work:

$myString = 'hellÖ'

[xml]$myXml = [System.Management.Automation.PSSerializer]::Serialize($myString)

$settings = New-Object System.Xml.XmlWriterSettings
$settings.Encoding = [System.Text.Encoding]::ASCII
$settings.Indent = $true

$writer = [System.Xml.XmlWriter]::Create("./test.xml", $settings)
$myXml.Save($writer)
$writer.Dispose()

This puts out an ASCII-encoded text file with an XML declation declaring the document encoding is ASCII and uses hexadecimal numeric character entity references for XML content characters that can't be represented in ASCII:

<?xml version="1.0" encoding="us-ascii"?>
<Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04">
  <S>hell&#xD6;</S>
</Objs>

As you can see here in the C1 Controls and Latin-1 Supplement block, U+00D6 (&#D6;), is Ö LATIN CAPITAL LETTER O WITH DIAERESIS

like image 81
Tom Blodget Avatar answered Oct 23 '25 18:10

Tom Blodget