i came across a little issue when dealing with csv-exports which contains mutated vowels like ä,ö,ü (German Language Umlaute)
i simply export with
Get-WinEvent -FilterHashtable @{Path=$_;ID=4627} -ErrorAction SilentlyContinue |export-csv -NoTypeInformation -Encoding Default -Force ("c:\temp\CSV_temp\"+ $_.basename + ".csv")
which works fine. i have the ä,ö,ü in my csv-file correctly.
after that i do a little sorting with:
Get-ChildItem 'C:\temp\*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Sort-Object { [DateTime]::ParseExact($_.TimeCreated, $pattern, $culture) } |
Export-Csv 'C:\temp\merged.csv' -Encoding Default -NoTypeInformation -Force
i played around with all encodings, ASCII, BigEndianUnicode, UniCode(s) with no success.
how can i preserve the special characters ä,ö,ü and others when exporting and sorting?
The Import from CSV command parses a comma-separated value (CSV) file and loads it into the project repository. CSV is a long-standing simple data exchange mechanism supported by everything from Microsoft Excel® to specialty tools in test, risk, and beyond.
and CSV files. Simple CSV files do not support Unicode/UTF-8 characters. This is a limitation of the CSV format and not something that can be changed in DEAR.
Mathias R. Jessen provides the crucial pointer in a comment on the question:
It is the Import-Csv
call, not Export-Csv
, that is the cause of the problem in your case:
Like Export-Csv
, Import-Csv
too needs to be passed -Encoding Default
in order to properly process text files encoded with the system's active "ANSI" legacy code page, which is an 8-bit, single-byte character encoding such as Windows-1252.
In Windows PowerShell, even though the generic text-file processing Get-Content
/ Set-Content
cmdlet pair defaults to Default
encoding (as the name suggests), regrettably and surprisingly, Import-Csv
and Export-Csv
do not.
Note that on reading a default encoding is only assumed if the input file has no BOM (byte-order mark, a.k.a Unicode signature, a magic byte sequence at the start of the file that unambiguously identifies the file's encoding).
Not only do Import-Csv
and Export-Csv
have defaults that differ from Get-Content
/ Set-Content
, they individually have different defaults:
Import-Csv
defaults to UTF-8.Export-Csv
defaults to ASCII(!), which means that any non-ASCII characters -such as ä
, ö
, ü
- are transliterated to literal ?
chars., resulting in loss of data.By contrast, in PowerShell Core, the cross-platform edition built on .NET Core, the default encoding is (BOM-less) UTF-8, consistently, across all cmdlets, which greatly simplifies matters and makes it much easier to determine when you do need to use the -Encoding
parameter.
Import-Csv
/ Export-Csv
behaviorImport-Csv
- defaults to UTF-8:
# Sample CSV content.
$str = @'
Column1
aäöü
'@
# Write sample CSV file 't.csv' using UTF-8 encoding *without a BOM*
# (Note that this cannot be done with standard PowerShell cmdlets.)
$null = new-item -type file t.csv -Force
[io.file]::WriteAllLines((Convert-Path t.csv), $str)
# Use Import-Csv to read the file, which correctly preserves the UTF-8-encoded
# umlauts
Import-Csv .\t.csv
The above yields:
Column1
-------
aäöü
As you can see, the umlauts were correctly preserved.
By contrast, had the file been "ANSI"-encoded ($str | Set-Content t.csv
; -Encoding Default
implied), the umlauts would have gotten corrupted.
Export-Csv
- defaults to ASCII - risk of data loss:
Building on the above example:
Import-Csv .\t.csv | Export-Csv .\t.new.csv
Get-Content .\t.new.csv
yields:
"Column1"
"a???"
As you can see, the umlauts were replaced by literal question marks (?
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With