Do you know any way that I could programmatically or via scrirpt transform a set of text files saved in ansi character encoding, to unicode encoding?
I would like to do the same as I do when I open the file with notepad and choose to save it as an unicode file.
This could work for you, but notice that it'll grab every file in the current folder:
Get-ChildItem | Foreach-Object { $c = (Get-Content $_); `
Set-Content -Encoding UTF8 $c -Path ($_.name + "u") }
Same thing using aliases for brevity:
gci | %{ $c = (gc $_); sc -Encoding UTF8 $c -Path ($_.name + "u") }
Steven Murawski suggests using Out-File
instead. The differences between both cmdlets are the following:
Out-File
will attempt to format the input it receives.Out-File
's default encoding is Unicode-based, whereas Set-Content
uses the system's default.Here's an example assuming the file test.txt
doesn't exist in either case:
PS> [system.string] | Out-File test.txt
PS> Get-Content test.txt
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
# test.txt encoding is Unicode-based with BOM
PS> [system.string] | Set-Content test.txt
PS> Get-Content test.txt
System.String
# test.txt encoding is "ANSI" (Windows character set)
In fact, if you don't need any specific Unicode encoding, you could as well do the following to convert a text file to Unicode:
PS> Get-Content sourceASCII.txt > targetUnicode.txt
Out-File
is a "redirection operator with optional parameters" of sorts.
The easiest way would be Get-Content 'path/to/text/file' | out-file 'name/of/file'.
Out-File has an -encoding parameter, the default of which is Unicode.
If you wanted to script a batch of them, you could do something like
$files = get-childitem 'directory/of/text/files'
foreach ($file in $files)
{
get-content $file | out-file $file.fullname
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With