Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PowerShell: Import-CSV with no headers and remove partial duplicate lines

Tags:

powershell

csv

I have a log file that is formatted as a CSV with no headers. The first column is basically the unique identifier for the issues being recorded. There may be multiple lines with different details for the same issue identifier. I would like to remove lines where the first column is duplicated because I don't need the other data at this time.

I have fairly basic knowledge of PowerShell at this point, so I'm sure there's something simple I'm missing.

I'm sorry if this is a duplicate, but I could find questions to answer some parts of the question, but not the question as a whole.

So far, my best guess is:

Import-Csv $outFile | % { Select-Object -Index 1 -Unique } | Out-File $outFile -Append

But this gives me the error:

Import-Csv : The member "LB" is already present. At C:\Users\jnurczyk\Desktop\Scratch\POImport\getPOImport.ps1:6 char:1 + Import-Csv $outFile | % { Select-Object -InputObject $_ -Index 1 -Unique } | Out ... + ~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Import-Csv], ExtendedTypeSystemException + FullyQualifiedErrorId : AlreadyPresentPSMemberInfoInternalCollectionAdd,Microsoft.PowerShell.Commands.ImportCsvCommand

like image 611
Joshua Nurczyk Avatar asked Dec 11 '13 17:12

Joshua Nurczyk


2 Answers

Because your data has no headers, you need to specify the headers in your Import-Csv cmdlet. And then to select only unique records using the first column, you need to specify that in the Select-Object cmdlet. See code below:

Import-Csv $outFile -Header A,B,C | Select-Object -Unique A

To clarify, the headers in my example are A, B, and C. This works if you know how many columns there are. If you have too few headers, then columns are dropped. If you have too many headers, then they become empty fields.

like image 70
Benjamin Hubbard Avatar answered Nov 13 '22 17:11

Benjamin Hubbard


Every time I look for a solution to this issue I run across this thread. However the solution accepted here is more generic that I would like. The function below Increments each time it sees the same header name: A, B, C, A1 D, A2, C1 etc.

Function Import-CSVCustom ($csvTemp) {
    $StreamReader = New-Object System.IO.StreamReader -Arg $csvTemp
    [array]$Headers = $StreamReader.ReadLine() -Split "," | % { "$_".Trim() } | ? { $_ }
    $StreamReader.Close()

    $a=@{}; $Headers = $headers|%{
        if($a.$_.count) {"$_$($a.$_.count)"} else {$_}
        $a.$_ += @($_)
    }

    Import-Csv $csvTemp -Header $Headers
}
like image 36
user3818571 Avatar answered Nov 13 '22 18:11

user3818571