Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PowerShell Import-CSV to Export-CSV with type conversion

I get a bunch of tab-delimited files and I want to convert them to comma-delimited files. I also need only two columns from the file "Date" and "Value1", so I drop the rest of the columns.

Date    Value1  Value2  Value3  Value4  Sensor
08.07.2010  115,28  115,45  115,45  115,28  100
07.07.2010  115,34  115,32  115,34  115,25  85
06.07.2010  115,23  115,74  115,74  115,20  203

This works as follows:

(get-ChildItem -Path '*.txt').name | ForEach-Object { 
    Import-Csv -Path $_ -Delimiter "`t" | 
    Select-Object -Property Date,Value1 | 
    Export-Csv "out\$_" 
}

Unfortunately in the source file a comma is used as decimal separator. So I get these files where Value1 is interpreted as string:

"Date","Value1"
"24.02.2017","30,18"
"23.02.2017","30,20"
"22.02.2017","30,18"
"21.02.2017","30,18"
"20.02.2017","30,17"

How can I set the data type of the column to numeric? During Import-CSV?

like image 520
Ale Avatar asked Apr 20 '26 23:04

Ale


1 Answers

Export-CSV always adding "..." around the output field values is not (in itself) a problem:

  • The CSV data format has no intrinsic concept of a data type, and double-quoting is solely used to enclose a field value (which is a syntactical requirement for field values with embedded , chars., for instance).

  • Excel, for instance, infers a column's data type solely from its content, whether that content is enclosed in double quotes or not.

    • The content interpretation is culture-aware (locale-aware), however.
  • By contrast, PowerShell's Import-Csv never interprets the data and returns all field values as strings.

Depending on the active culture in the environment the CSV will be processed in, you have two options:

  • For a culture such as de-DE (Germany) in which , (comma) rather than . (dot, period) is used as the decimal mark, and ; rather than , is used as the list separator:

    • With the target culture in effect (verify with Get-Culture) use Export-Csv -UseCulture
      (-UseCulture is available since (at least) v2[1] ).Thanks, LotPings
    • Alternatively, use something like Export-Csv -Delimiter ([cultureinfo]::GetCultureInfo('de-DE').TextInfo.ListSeparator)
  • For an English-language culture (or any other culture that use . as the decimal mark):

    • Custom-process the file and replace the , instances in numeric fields with . - see below.

To replace the , instances with ., the following command, which uses string manipulation only, will do:

Get-ChildItem -Path *.txt | ForEach-Object {
  Get-Content -LiteralPath $_.FullName | ForEach-Object {
    ($_ -split '\t')[0,1] -replace ',', '.' -join ','
  } |
    Set-Content -Encoding utf8 "out/$($_.Name)"
}

Note: For simplicity, the command assumes that the Date column values contain no instances of , and that none of the values require "..."-enclosing - which is reasonable in this specific scenario.

  • I've chosen UTF-8 as the output encoding above, because Set-Content defaults to the legacy, culture-specific "ANSI" code page - adjust as needed.

[1] Generally, if a parameter description in a cmdlet's help topic doesn't mention a specific version in which it was introduced, it implies that it's been around since v2. You can now browse older versions of the documentation - down to v3 - on GitHub - just type T or click Find file and start typing a cmdlet / conceptual help-topic name.

like image 77
mklement0 Avatar answered Apr 24 '26 04:04

mklement0



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!