I am collecting performance counters from NetApp Performance Manager software (OPM). OPM saves 30 days worth of data in MySQL database. So i have to put two queries to retrieve the data:
Then merge the two csv files to get data if there are 31 days in a month.
Both files look like below:
"Time","objid","cpuBusy","cifsOps","avgLatency"
"2016:06:04 00:04","72","50.6196","2069.11","7622.1"
"2016:06:04 00:09","72","30.2233","2075.94","7633.27"
"2016:06:04 00:14","72","35.2559","1980.64","8352.17"
When i merge the two csv files with below code. I get duplicate rows with data from same data/time.
@(Import-Csv au2004npsa003-mm-business.csv) + @(Import-Csv au2004npsa003-nn-business.csv) | export-csv joined.csv -NoTypeInformation
How can i merge the two csv files without getting duplicate data?
I have tried select -unique
however, it gives just one row.
As for why Select-Object -Unique
didn't work:
Select-Object -Unique
, when given instances of reference types (other than strings), compares their .ToString()
values in order to determine uniqueness.
[pscustomobject]
instances, such as the ones Import-Csv
creates, regrettably return the empty string from their .ToString()
method.
Thus, all input objects compare the same, and only the first input object is ever returned.
S9uare's helpful Select-Object -Property * -Unique
approach overcomes this problem by forcing all properties to be compared invidually, but comes with a performance caveat:
The input objects are effectively recreated, and comparing all property values is overkill in this case, because comparing Time
values would suffice; with large input files, processing can take a long time.
Since the data at hand comes from CSV files, the performance problem can be helped with string processing, using Get-Content
rather than Import-Csv
:
Get-Content au2004npsa003-mm-business.csv, au2004npsa003-nn-business.csv |
Select-Object -Unique |
Set-Content -Encoding ASCII joined.csv
Note that I'm using -Encoding ASCII
to mimic Export-Csv
's default behavior; change as needed.
With input objects that are strings, Select-Object -Unique
works as expected - and is faster.
Note, however, that with large input files that you may run out of memory, given that Select-Object
needs to build up an in-memory data structure containing all rows in order to determine uniqueness.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With