I am having a little trouble creating/populating a csv file with powershell. I am new to powershell, so I may be missing some obvious things, so please go easy on me. Here is the situation:
First I am creating an array(?) to act as my table
#Create output table with headers
$output = @()
$row = New-Object System.Object
$row | Add-Member -MemberType NoteProperty -Name "Example Header 1" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 2" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 3" -Value $null
$output += $row
I am writing it to a file using
$output | Export-Csv new.csv -NoTypeInformation
This appears to make a csv file with the headers that I want. If there is a better way to do this, please let me know. The next step is where I am running into problems. I now need to programatically populate the table with data. When importing existing csv files, I am able to access/modify data in the table like an array (i.e. $output[rowIndex]."Header Name" = "new data"
).
So I tried to add data to my newly created table. I wrote $ouput[0]."Example Header 1" = "Test Data"
. This works as I expected and populates the first row in the column with the specified header with "Test Data". However, I can ONLY access [0]. $output[1]
and so on cause errors because I guess they do not exist. I tried using $output += $row
again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because its all the same object).
So basically my question is, how can I create a csv file from scratch, add some headers to it, and then start writing to all the (unknown/variable number of) rows? I am sure there is a better way to do it, but like I said, I am very new to powershell. Ideally I would like to be able to access rows by index (0,1,2, etc) but I am open to whatever.
Basic solution (adapted from Martin Brandl's answer)
This basically reads data from one csv file, and inserts it into another with new specified headers.
$csv = Import-Csv "MyCsv.csv"
$newCsv = @()
foreach($row in $csv) {
$newCsv += [PSCustomObject]@{
"New Column Header1" = $row."Original Column Header1"
"New Column Header2" = $row."Original Column Header2"
}
}
To complement Martin Brandl's helpful answer with an explanation of your symptoms (emphasis added):
I tried using
$output += $row
again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because it's all the same object).
Indeed, that is what happened: In .NET terms, type (class) [pscustomobject]
is a reference type rather than a value type - as evidenced by [pscustomobject].IsValueType
returning $false
.
If you add a given instance (object) of a reference type to an array multiple times, all such elements point to the very same instance.
Here's brief demonstration.
$obj = [PSCustomObject] @{
'Example Header 1' = $null
'Example Header 2' = $null
}
$array = @()
foreach ($ndx in 1..2) {
# By working with the original $obj every time, you
# keep modifying the same instance's property values.
$obj.'Example Header 1' = "h1-$ndx"
$obj.'Example Header 2' = "h2-$ndx"
# Adding $obj to an array does NOT create a COPY of $obj
# but stores a REFERENCE directly to $obj in the array
# (similar to storing a pointer in unmanaged languages such as C++).
$array += $obj
}
# Output the array.
$array
This yields the following:
Example Header 1 Example Header 2
---------------- ----------------
h1-2 h2-2
h1-2 h2-2
As you can see, only the last values assigned to .Example Header 1
and .Example Header 2
took effect, because both array elements reference the very same object.
Martin's approach is the simplest way to solve this problem: create a new instance of the custom object in every iteration (casting a hashtable literal to [pscustomobject]
, as now shown in the question itself: $array += [pscustomobject] @{ ... }
).
If you don't want to or cannot recreate your instances from scratch inside the loop, you have two basic choices:
Clone a template custom object in every loop iteration or simply use [pscustomobject] @{ ... }
object creation inside the loop, which implicitly creates a new instance every time:
PSv5+ alternative: Define a custom class and instantiate it in every loop iteration - see below.
In PSv5+, a custom class allows for an elegant solution that also performs better than creating instances in-loop using literal syntax.
# Define a custom class that represents the rows of the
# output CSV.
# Note: [object] is being used here as the properties' type.
# In real life, you'd use more specific types such as [string]
# or [int].
class CsvRow {
[object] ${Example Header 1}
[object] ${Example Header 2}
}
$array = @()
foreach ($ndx in 1..2) {
# Instantiate the custom class.
$rowObj = [CsvRow]::new()
# Set the values.
$rowObj.'Example Header 1' = "h1-$ndx"
$rowObj.'Example Header 2' = "h2-$ndx"
# Add the instance to the array.
$array += $rowObj
}
# Output the array.
$array
Two factors determine performance:
How quickly the array is extended in each loop iteration:
Extending an array element by element with $array += ...
is very convenient, but is slow and inefficient, because a new array must be created every time (arrays are fixed-size collections and cannot be directly extended).
For small iteration counts that may not matter, but the higher the number, the more performance will suffer, and at some point this approach becomes infeasible.
The next best solution is to use a [System.Collections.Generic.List[object]]
instance to build the array instead - such lists are designed to be efficiently extended.
The best and simplest solution, however, is to simply let PowerShell collect multiple outputs from a loop-like statement in an array, simply by assigning to a variable - see below.
How quickly the new object is instantiated in each loop iteration:
[CsvRow]::new()
is used for instantiation; the functionally equivalent New-Object CsvRow
is much slower, due to involving a cmdlet call.The following variant of the custom-class solution uses implicit array creation to ensure acceptable performance even with higher iteration counts:
# Define the custom class.
class CsvRow {
[object] ${Example Header 1}
[object] ${Example Header 2}
}
# Determine the iteration count.
$count = 1000
# Loop and let PowerShell collect the outputs
# from all iterations implicitly in variable $array
[array] $array = foreach ($ndx in 1..$count) {
# Instantiate the custom class.
$rowObj = [CsvRow]::new()
# Set the values.
$rowObj.'Example Header 1' = "h1-$ndx"
$rowObj.'Example Header 2' = "h2-$ndx"
# Simply output the row object
$rowObj
}
# Output the array.
$array
Note: The [array]
type constraint is only needed if you need to ensure that $ToWrite
is always an array; without it, if there happened to be just a single loop iteration and therefore output object, $ToWrite
would store that output object as-is, not wrapped in an array (this behavior is fundamental to PowerShell's pipeline).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With