Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Columns from CSV File Using Powershell

Tags:

powershell

csv

I need to remove several columns from a CSV file without importing the CSV file in Powershell. Below is an example of my input CSV and what I hope the output CSV can look like.

Input.csv

A,1,2,3,4,5

B,6,7,8,9,10

C,11,12,13,14,15

D,15,16,17,18,19,20

Idealoutput.csv

A,3,5

B,8,10

C,13,15

D,17,20

I have tried doing this the following code, but it is giving me plenty of errors and saying that I cannot use the "Delete" method this way (which I have done in the past)...Any ideas?

$Workbook1 = $Excel.Workbooks.open($file.FullName) 
$header = $Workbook1.ActiveSheet.Range("A1:A68").EntireRow
$unneededcolumns1 = $Workbook1.ActiveSheet.Range("A1:O1").EntireColumn
$unneededcolumns2 = $Workbook1.ActiveSheet.Range("B1:K1").EntireColumn
$unneededcolumns3 = $Workbook1.ActiveSheet.Range("F1:I1").EntireColumn
$unneededcolumns4 = $Workbook1.ActiveSheet.Range("G1:I1").EntireColumn
$unneededcolumns5 = $Workbook1.ActiveSheet.Range("H1:O1").EntireColumn
$unneededcolumns6 = $Workbook1.ActiveSheet.Range("J1:AL1").EntireColumn
$unneededcolumns7 = $Workbook1.ActiveSheet.Range("K1").EntireColumn
$unneededcolumns8 = $Workbook1.ActiveSheet.Range("L1:AK1").EntireColumn
$unneededcolumns9 = $Workbook1.ActiveSheet.Range("F1:I1").EntireColumn
$unneededcolumns10 = $Workbook1.ActiveSheet.Range("M1:AB1").EntireColumn
$unneededcolumns11 = $Workbook1.ActiveSheet.Range("N1:X1").EntireColumn
$unneededcolumns12 = $Workbook1.ActiveSheet.Range("O1:BA1").EntireColumn
$unneededcolumns13 = $Workbook1.ActiveSheet.Range("P1:U1").EntireColumn
$header.Delete()
$unneededcolumns1.Delete()
$unneededcolumns2.Delete()
$unneededcolumns3.Delete()
$unneededcolumns4.Delete()
$unneededcolumns5.Delete()
$unneededcolumns6.Delete()
$unneededcolumns7.Delete()
$unneededcolumns8.Delete()
$unneededcolumns9.Delete()
$unneededcolumns10.Delete()
$unneededcolumns11.Delete()
$unneededcolumns12.Delete()
$unneededcolumns13.Delete()

$Workbook1.SaveAs("\\output.csv")
like image 608
Casousadc Avatar asked Jan 08 '23 00:01

Casousadc


2 Answers

I am just going to add this anyway since I hope to convince you how easy it will be to avoid having to use Excel.

$source = "c:\temp\file.csv"
$destination = "C:\temp\newfile.csv"
(Import-CSV $source -Header 1,2,3,4,5,6 | 
    Select "1","4","6" | 
    ConvertTo-Csv -NoTypeInformation | 
    Select-Object -Skip 1) -replace '"' | Set-Content $destination

We assign arbitrary headers to the object and that way we can call the 1st, 4th and 6th columns by position. Once exported the file will have the following contents which match what I think you want and not what you had in the question. Your last line had an extra value (20) on it which I don't know if it was on purpose or not.

A,3,5
B,8,10
C,13,15
D,17,19

If this is not viable I am really interested as to why.

Excel Approach

Alright, so the file is enormous so Import-CSV is not a viable option. Keeping with your excel idea I came up with this. What it will do is take column indexes and delete any column that is not in those indices.

Wait you say?... that wont work since the column indexes change as you remove columns. Using the indices we want to keep we get the inverse to delete based on the UsedRows of the sheet. We then take each of those columns to delete and remove a value equal to is array position. Reason being is that when a column is actually deleted the next value has already been adjusted to account for the shift.

$file = "c:\temp\file.csv"
$ColumnsToKeep = 1,4,6

# Create the com object
$excel = New-Object -comobject Excel.Application
$excel.DisplayAlerts = $False
$excel.visible = $False

# Open the CSV File
$workbook = $excel.Workbooks.Open($file)
$sheet = $workbook.Sheets.Item(1)

# Determine the number of rows in use
$maxColumns = $sheet.UsedRange.Columns.Count

$ColumnsToRemove = Compare-Object $ColumnsToKeep (1..$maxColumns) | Where-Object{$_.SideIndicator -eq "=>"} | Select-Object -ExpandProperty InputObject
0..($ColumnsToRemove.Count - 1) | %{$ColumnsToRemove[$_] = $ColumnsToRemove[$_] - $_}
$ColumnsToRemove  | ForEach-Object{
    [void]$sheet.Cells.Item(1,$_).EntireColumn.Delete()
}

# Save the edited file
$workbook.SaveAs("C:\temp\newfile.csv", 6)

# Close excel and release the com object.
$workbook.Close($true)
$excel.Quit()
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
Remove-Variable excel 

I was having issues with Excel remaining open even after reading up on the "correct" way to do it. The inner logic is what is important. Don't forget to change your paths as needed.

like image 194
Matt Avatar answered Jan 14 '23 23:01

Matt


Here's a better approach that I use, but it's not the most performant on large files. Both have been tested on 1GB files.

Powershell:

Import-Csv '.\inputfile.csv' 
  | select ColumnName1,ColumnName2,ColumnName3 
  | Export-Csv -Path .\outputfile.csv -NoTypeInformation

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-5.1

If you want to get rid of those pesky quotes that the tool adds, upgrade to Powershell 7.

Powershell 7+:

Import-Csv '.\inputfile.csv' 
  | select ColumnName1,ColumnName2,ColumnName3 
  | Export-Csv -Path .\outputfile.csv -NoTypeInformation -UseQuotes Never

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7

like image 30
Michael Brown Avatar answered Jan 14 '23 22:01

Michael Brown