I'm needing to replace a hex 93 character to a "" string inside several csv files. Below is the code that I'm using. But it is not working I think the reason that it does not work is because the hex value is greater than 7F (Dec 127). I've tried several other methods to no avail. Any help would be appreciated.
$q1 = [String](0x93 -as [char])
Get-ChildItem ".\*.csv" -Recurse | ForEach {
(Get-Content $_ | ForEach { $_.replace($q1, '""') }) |
Set-Content $_
}
Note: Attach is a image of the format-hex dump of my test file. The first character is the one that I need to perform the replace on:
One way to do that is to use the -replace operator. This PowerShell operator finds a string and replaces it with another. Using the example file contents, we can provide the search string foo with the replacement string bar which should make the file contents foo foo baz now.
Using the Replace() Method The replace() method has two arguments; the string to find and the string to replace the found text with. As you can see below, PowerShell is finding the string hello and replacing that string with the string hi . The method then returns the final result which is hi, world .
On a standard 101 keyboard, special extended ASCII characters such as é or ß can be typed by holding the ALT key and typing the corresponding 4 digit ASCII code. For example é is typed by holding the ALT key and typing 0233 on the keypad.
The extended ASCII characters includes the binary values from 128 (1000 0000) through 255 (1111 1111). Unlike standard ASCII characters, there are multiple versions of the extended ASCII character set.
In Windows PowerShell, the default character encoding when reading from / writing to[1]files is "ANSI", i.e., the legacy 8-bit code page implied by the active system locale.
(By contrast, PowerShell Core defaults to UTF-8.)
For instance, the code page associated with the system locale on an US-English system is 1252
, i.e., Windows-1252, where code point 0x93
is the non-ASCII “
quotation mark.
Howere, once a text file's content has been read into memory, in memory a string's characters are represented as UTF-16LE code units, i.e., as .NET [string]
instances.
As a Unicode character, “
has code point U+201c
, expressed as 0x201c
in UTF-16LE.
Therefore - because in memory all strings are UTF-16LE code units - what you need to replace is [char] 0x201c
:
$q1 = [char] 0x201c # “
Get-ChildItem *.csv -Recurse | ForEach-Object {
(Get-Content $_.FullName) -replace $q1, '""' | Set-Content $_.FullName
}
Note that Set-Content
too uses the default character encoding, so the rewritten files will use "ANSI" encoding too - use the -Encoding
parameter to change the output encoding, if desired.
Also note the (...)
around the Get-Content
call, which ensures that the input file i read into memory in full up front, which enables writing back to the same file in the same pipeline.
While this approach is convenient, note that it bears a slight risk of data loss if writing back to the input file is interrupted before completion.
Converting an "ANSI" code point to a Unicode code point
The following shows how an "ANSI" (8-bit) code point such as 0x93
can be converted to its equivalent UTF-16 code point, 0x201c
:
# Convert an array of "ANSI" code points (1 byte each) to the UTF-16
# string they represent.
# Note: In Windows PowerShell, [Text.Encoding]::Default contains
# the "ANSI" encoding set by the system locale.
$str = [Text.Encoding]::Default.GetString([byte[]] 0x93) # -> '“'
# Get the UTF-16 code points of the characters making up the string.
$codePoints = [int[]] [char[]] $str
# Format the first and only code point as a hex. number.
'0x{0:x}' -f $codePoints[0] # -> '0x201c'
[1] Writing files with Set-Content
, that is; using Out-File
/ >
, by contrast, creates UTF-16LE ("Unicode") files. The cmdlets in Windows PowerShell display a bewildering array of differing encodings: see this answer. Fortunately, PowerShell Core now consistently defaults to (BOM-less) UTF-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With