I get a bunch of tab-delimited files and I want to convert them to comma-delimited files. I also need only two columns from the file "Date" and "Value1", so I drop the rest of the columns.
Date Value1 Value2 Value3 Value4 Sensor
08.07.2010 115,28 115,45 115,45 115,28 100
07.07.2010 115,34 115,32 115,34 115,25 85
06.07.2010 115,23 115,74 115,74 115,20 203
This works as follows:
(get-ChildItem -Path '*.txt').name | ForEach-Object {
Import-Csv -Path $_ -Delimiter "`t" |
Select-Object -Property Date,Value1 |
Export-Csv "out\$_"
}
Unfortunately in the source file a comma is used as decimal separator. So I get these files where Value1 is interpreted as string:
"Date","Value1"
"24.02.2017","30,18"
"23.02.2017","30,20"
"22.02.2017","30,18"
"21.02.2017","30,18"
"20.02.2017","30,17"
How can I set the data type of the column to numeric? During Import-CSV?
Export-CSV always adding "..." around the output field values is not (in itself) a problem:
The CSV data format has no intrinsic concept of a data type, and double-quoting is solely used to enclose a field value (which is a syntactical requirement for field values with embedded , chars., for instance).
Excel, for instance, infers a column's data type solely from its content, whether that content is enclosed in double quotes or not.
By contrast, PowerShell's Import-Csv never interprets the data and returns all field values as strings.
Depending on the active culture in the environment the CSV will be processed in, you have two options:
For a culture such as de-DE (Germany) in which , (comma) rather than . (dot, period) is used as the decimal mark, and ; rather than , is used as the list separator:
Get-Culture) use Export-Csv -UseCulture-UseCulture is available since (at least) v2[1]
).Thanks, LotPingsExport-Csv -Delimiter ([cultureinfo]::GetCultureInfo('de-DE').TextInfo.ListSeparator)For an English-language culture (or any other culture that use . as the decimal mark):
, instances in numeric fields with . - see below.To replace the , instances with ., the following command, which uses string manipulation only, will do:
Get-ChildItem -Path *.txt | ForEach-Object {
Get-Content -LiteralPath $_.FullName | ForEach-Object {
($_ -split '\t')[0,1] -replace ',', '.' -join ','
} |
Set-Content -Encoding utf8 "out/$($_.Name)"
}
Note: For simplicity, the command assumes that the Date column values contain no instances of , and that none of the values require "..."-enclosing - which is reasonable in this specific scenario.
Set-Content defaults to the legacy, culture-specific "ANSI" code page - adjust as needed.[1] Generally, if a parameter description in a cmdlet's help topic doesn't mention a specific version in which it was introduced, it implies that it's been around since v2. You can now browse older versions of the documentation - down to v3 - on GitHub - just type T or click Find file and start typing a cmdlet / conceptual help-topic name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With