I'm needing to replace a hex 93 character to a "" string inside several csv files. Below is the code that I'm using. But it is not working I think the reason that it does not work is because the hex value is greater than 7F (Dec 127). I've tried several other methods to no avail. Any help would be appreciated. <pre class="prettyprint"><code>$q1 = [String](0x93 -as [char]) Get-ChildItem ".\*.csv" -Recurse | ForEach { (Get-Content $_ | ForEach { $_.replace($q1, '""') }) | Set-Content $_ } </code></pre> Note: Attach is a image of the format-hex dump of my test file. The first character is the one that I need to perform the replace on: <img src="https://i.stack.imgur.com/0viko.png" alt="enter image description here">

In Windows PowerShell, the default character encoding when reading from / writing to[1]files is "ANSI", i.e., the legacy 8-bit code page implied by the active system locale. (By contrast, PowerShell Core defaults to UTF-8.) For instance, the code page associated with the system locale on an US-English system is <code>1252</code>, i.e., Windows-1252, where code point <code>0x93</code> is the non-ASCII <code>“</code> quotation mark. Howere, once a text file's content has been read into memory, in memory a string's characters are represented as UTF-16LE code units, i.e., as .NET <code>[string]</code> instances. As a Unicode character, <code>“</code> has code point <code>U+201c</code>, expressed as <code>0x201c</code> in UTF-16LE. Therefore - because in memory all strings are UTF-16LE code units - what you need to replace is <code>[char] 0x201c</code>: <pre class="prettyprint"><code>$q1 = [char] 0x201c # “ Get-ChildItem *.csv -Recurse | ForEach-Object { (Get-Content $_.FullName) -replace $q1, '""' | Set-Content $_.FullName } </code></pre> Note that <code>Set-Content</code> too uses the default character encoding, so the rewritten files will use "ANSI" encoding too - use the <code>-Encoding</code> parameter to change the output encoding, if desired. Also note the <code>(...)</code> around the <code>Get-Content</code> call, which ensures that the input file i read into memory in full up front, which enables writing back to the same file in the same pipeline. While this approach is convenient, note that it bears a slight risk of data loss if writing back to the input file is interrupted before completion. <hr> Converting an "ANSI" code point to a Unicode code point The following shows how an "ANSI" (8-bit) code point such as <code>0x93</code> can be converted to its equivalent UTF-16 code point, <code>0x201c</code>: <pre class="prettyprint"><code># Convert an array of "ANSI" code points (1 byte each) to the UTF-16 # string they represent. # Note: In Windows PowerShell, [Text.Encoding]::Default contains # the "ANSI" encoding set by the system locale. $str = [Text.Encoding]::Default.GetString([byte[]] 0x93) # -> '“' # Get the UTF-16 code points of the characters making up the string. $codePoints = [int[]] [char[]] $str # Format the first and only code point as a hex. number. '0x{0:x}' -f $codePoints[0] # -> '0x201c' </code></pre> <hr> [1] Writing files with <code>Set-Content</code>, that is; using <code>Out-File</code> / <code>></code>, by contrast, creates UTF-16LE ("Unicode") files. The cmdlets in Windows PowerShell display a bewildering array of differing encodings: see this answer. Fortunately, PowerShell Core now consistently defaults to (BOM-less) UTF-8.

using powershell to replace extended ascii character in a text file

Tags:

replace

powershell

character-encoding

I'm needing to replace a hex 93 character to a "" string inside several csv files. Below is the code that I'm using. But it is not working I think the reason that it does not work is because the hex value is greater than 7F (Dec 127). I've tried several other methods to no avail. Any help would be appreciated.

$q1 = [String](0x93 -as [char])
Get-ChildItem ".\*.csv" -Recurse | ForEach {
(Get-Content $_ | ForEach  { $_.replace($q1, '""') }) |
     Set-Content $_
}

Note: Attach is a image of the format-hex dump of my test file. The first character is the one that I need to perform the replace on: enter image description here

747

asked Sep 04 '18 21:09

HockChai Lim

1 Answers

In Windows PowerShell, the default character encoding when reading from / writing to^[1]files is "ANSI", i.e., the legacy 8-bit code page implied by the active system locale.
(By contrast, PowerShell Core defaults to UTF-8.)

For instance, the code page associated with the system locale on an US-English system is 1252, i.e., Windows-1252, where code point 0x93 is the non-ASCII “ quotation mark.

Howere, once a text file's content has been read into memory, in memory a string's characters are represented as UTF-16LE code units, i.e., as .NET [string] instances.

As a Unicode character, “ has code point U+201c, expressed as 0x201c in UTF-16LE.

Therefore - because in memory all strings are UTF-16LE code units - what you need to replace is [char] 0x201c:

$q1 = [char] 0x201c  # “
Get-ChildItem *.csv -Recurse | ForEach-Object {
  (Get-Content $_.FullName) -replace $q1, '""' | Set-Content $_.FullName
}

Note that Set-Content too uses the default character encoding, so the rewritten files will use "ANSI" encoding too - use the -Encoding parameter to change the output encoding, if desired.

Also note the (...) around the Get-Content call, which ensures that the input file i read into memory in full up front, which enables writing back to the same file in the same pipeline.
While this approach is convenient, note that it bears a slight risk of data loss if writing back to the input file is interrupted before completion.

Converting an "ANSI" code point to a Unicode code point

The following shows how an "ANSI" (8-bit) code point such as 0x93 can be converted to its equivalent UTF-16 code point, 0x201c:

# Convert an array of "ANSI" code points (1 byte each) to the UTF-16
# string they represent. 
# Note: In Windows PowerShell, [Text.Encoding]::Default contains
#       the "ANSI" encoding set by the system locale.
$str = [Text.Encoding]::Default.GetString([byte[]] 0x93) # -> '“'

# Get the UTF-16 code points of the characters making up the string.
$codePoints = [int[]] [char[]] $str

# Format the first and only code point as a hex. number.
'0x{0:x}' -f $codePoints[0]  # -> '0x201c'

^{[1] Writing files with Set-Content, that is; using Out-File / >, by contrast, creates UTF-16LE ("Unicode") files. The cmdlets in Windows PowerShell display a bewildering array of differing encodings: see this answer. Fortunately, PowerShell Core now consistently defaults to (BOM-less) UTF-8.}

answered Sep 22 '22 13:09

mklement0

Related questions
                            
                                creating quoted path for shortcut with arguments in powershell
                            
                                Robocopy in TFS Build PowerShell Step Reports Failure But Has No Error
                            
                                Using RegEx matches with PowerShell
                            
                                Requested value 'VS2015' was not found. - Azure powershell
                            
                                Redirection to 'NUL' failed: FileStream will not open Win32 devices
                            
                                Extract Part of a string in powershell
                            
                                Powershell error adding to an array
                            
                                What is the PowerShell syntax to capture a named group from a [regex] type accelerator expression?
                            
                                PowerShell function to find the square of a number:
                            
                                PowerShell Opens Off Screen
                            
                                What does " :: " do and how do you use " :: " in powershell scripts?
                            
                                Cannot connect to secured Azure Service Fabric Cluster via Powershell or Visual Studio
                            
                                How do I use results of PowerShell's 'invoke-SQLcmd' in an 'if' statement?
                            
                                Azure Powershell Workflow - Input Parameters Not Found
                            
                                How to pass a parameter to a powershell script in a jenkins pipeline
                            
                                How to remove newlines from a text file with batch or PowerShell
                            
                                How do I specify a PowerShell script as a Docker container entry point?
                            
                                Login Form in Clockify
                            
                                Azure Data Factory Disable Triggers On Release
                            
                                Working with Powershell invoke-restmethod and json response

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With