Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create/populate a csv file with Powershell

Tags:

powershell

csv

I am having a little trouble creating/populating a csv file with powershell. I am new to powershell, so I may be missing some obvious things, so please go easy on me. Here is the situation:

First I am creating an array(?) to act as my table

#Create output table with headers
$output = @()
$row = New-Object System.Object
$row | Add-Member -MemberType NoteProperty -Name "Example Header 1" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 2" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 3" -Value $null
$output += $row

I am writing it to a file using $output | Export-Csv new.csv -NoTypeInformation

This appears to make a csv file with the headers that I want. If there is a better way to do this, please let me know. The next step is where I am running into problems. I now need to programatically populate the table with data. When importing existing csv files, I am able to access/modify data in the table like an array (i.e. $output[rowIndex]."Header Name" = "new data").

So I tried to add data to my newly created table. I wrote $ouput[0]."Example Header 1" = "Test Data". This works as I expected and populates the first row in the column with the specified header with "Test Data". However, I can ONLY access [0]. $output[1] and so on cause errors because I guess they do not exist. I tried using $output += $row again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because its all the same object).

So basically my question is, how can I create a csv file from scratch, add some headers to it, and then start writing to all the (unknown/variable number of) rows? I am sure there is a better way to do it, but like I said, I am very new to powershell. Ideally I would like to be able to access rows by index (0,1,2, etc) but I am open to whatever.


Basic solution (adapted from Martin Brandl's answer)

This basically reads data from one csv file, and inserts it into another with new specified headers.

$csv = Import-Csv "MyCsv.csv"
$newCsv = @()
foreach($row in $csv) {
    $newCsv += [PSCustomObject]@{
        "New Column Header1" = $row."Original Column Header1"
        "New Column Header2" = $row."Original Column Header2"
    }
}
like image 663
Thomas Avatar asked Dec 03 '22 13:12

Thomas


1 Answers

To complement Martin Brandl's helpful answer with an explanation of your symptoms (emphasis added):

I tried using $output += $row again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because it's all the same object).

Indeed, that is what happened: In .NET terms, type (class) [pscustomobject] is a reference type rather than a value type - as evidenced by [pscustomobject].IsValueType returning $false.

If you add a given instance (object) of a reference type to an array multiple times, all such elements point to the very same instance.

Here's brief demonstration.

$obj = [PSCustomObject] @{
    'Example Header 1' = $null
    'Example Header 2' = $null
}

$array = @()
foreach ($ndx in 1..2) {
  # By working with the original $obj every time, you
  # keep modifying the same instance's property values.
  $obj.'Example Header 1' = "h1-$ndx"
  $obj.'Example Header 2' = "h2-$ndx"
  # Adding $obj to an array does NOT create a COPY of $obj
  # but stores a REFERENCE directly to $obj in the array
  # (similar to storing a pointer in unmanaged languages such as C++).
  $array += $obj
}

# Output the array.
$array

This yields the following:

Example Header 1 Example Header 2
---------------- ----------------
h1-2             h2-2
h1-2             h2-2

As you can see, only the last values assigned to .Example Header 1 and .Example Header 2 took effect, because both array elements reference the very same object.

Martin's approach is the simplest way to solve this problem: create a new instance of the custom object in every iteration (casting a hashtable literal to [pscustomobject], as now shown in the question itself: $array += [pscustomobject] @{ ... }).

If you don't want to or cannot recreate your instances from scratch inside the loop, you have two basic choices:

  • Clone a template custom object in every loop iteration or simply use [pscustomobject] @{ ... } object creation inside the loop, which implicitly creates a new instance every time:

    • See this answer
  • PSv5+ alternative: Define a custom class and instantiate it in every loop iteration - see below.


Class-based Solution

In PSv5+, a custom class allows for an elegant solution that also performs better than creating instances in-loop using literal syntax.

# Define a custom class that represents the rows of the
# output CSV.
# Note: [object] is being used here as the properties' type.
#       In real life, you'd use more specific types such as [string]
#       or [int].
class CsvRow {
  [object] ${Example Header 1}
  [object] ${Example Header 2}
}

$array = @()
foreach ($ndx in 1..2) {
  # Instantiate the custom class.
  $rowObj = [CsvRow]::new()
  # Set the values.
  $rowObj.'Example Header 1' = "h1-$ndx"
  $rowObj.'Example Header 2' = "h2-$ndx"
  # Add the instance to the array.
  $array += $rowObj
}

# Output the array.
$array

Performance Considerations

Two factors determine performance:

  • How quickly the array is extended in each loop iteration:

    • Extending an array element by element with $array += ... is very convenient, but is slow and inefficient, because a new array must be created every time (arrays are fixed-size collections and cannot be directly extended).

    • For small iteration counts that may not matter, but the higher the number, the more performance will suffer, and at some point this approach becomes infeasible.

    • The next best solution is to use a [System.Collections.Generic.List[object]] instance to build the array instead - such lists are designed to be efficiently extended.

    • The best and simplest solution, however, is to simply let PowerShell collect multiple outputs from a loop-like statement in an array, simply by assigning to a variable - see below.

  • How quickly the new object is instantiated in each loop iteration:

    • Instantiating an instance of the custom class is faster than creating an instance via the hashtable literal, but only if [CsvRow]::new() is used for instantiation; the functionally equivalent New-Object CsvRow is much slower, due to involving a cmdlet call.

The following variant of the custom-class solution uses implicit array creation to ensure acceptable performance even with higher iteration counts:

# Define the custom class.
class CsvRow {
  [object] ${Example Header 1}
  [object] ${Example Header 2}
}

# Determine the iteration count.
$count = 1000

# Loop and let PowerShell collect the outputs
# from all iterations implicitly in variable $array
[array] $array = foreach ($ndx in 1..$count) {
  # Instantiate the custom class.
  $rowObj = [CsvRow]::new()
  # Set the values.
  $rowObj.'Example Header 1' = "h1-$ndx"
  $rowObj.'Example Header 2' = "h2-$ndx"
  # Simply output the row object
  $rowObj
}

# Output the array.
$array

Note: The [array] type constraint is only needed if you need to ensure that $ToWrite is always an array; without it, if there happened to be just a single loop iteration and therefore output object, $ToWrite would store that output object as-is, not wrapped in an array (this behavior is fundamental to PowerShell's pipeline).

like image 124
mklement0 Avatar answered Dec 06 '22 10:12

mklement0