Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a faster way to parse an excel document with Powershell?

I'm interfacing with an MS Excel document via Powershell. There is a possibility of each excel document of having around 1000 rows of data.

Currently this script seems to read the Excel file and write a value to screen at a rate of 1 record every .6 seconds. At first glance that seems extremely slow.

This is my first time reading an Excel file with Powershell, is this the norm? Is there a faster way for me to read and parse the Excel data?

Here is the script output (trimmed for readability)

PS P:\Powershell\ExcelInterfaceTest> .\WRIRMPTruckInterface.ps1 test.xlsx
3/20/2013 4:46:01 PM
---------------------------
2   078110
3   078108
4   078107
5   078109
<SNIP>
242   078338
243   078344
244   078347
245   078350
3/20/2013 4:48:33 PM
---------------------------
PS P:\Powershell\ExcelInterfaceTest>

Here is the Powershell script:

########################################################################################################
# This is a common function I am using which will release excel objects
########################################################################################################
function Release-Ref ($ref) {
    ([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref) -gt 0)
    [System.GC]::Collect()
    [System.GC]::WaitForPendingFinalizers()
}

########################################################################################################
# Variables
########################################################################################################

########################################################################################################
# Creating excel object
########################################################################################################
$objExcel = new-object -comobject excel.application 

# Set to false to not open the app on screen.
$objExcel.Visible = $False

########################################################################################################
# Directory location where we have our excel files
########################################################################################################
$ExcelFilesLocation = "C:/ShippingInterface/" + $args[0]

########################################################################################################
# Open our excel file
########################################################################################################
$UserWorkBook = $objExcel.Workbooks.Open($ExcelFilesLocation) 

########################################################################################################
# Here Item(1) refers to sheet 1 of of the workbook. If we want to access sheet 10, we have to modify the code to Item(10)
########################################################################################################
$UserWorksheet = $UserWorkBook.Worksheets.Item(2)

########################################################################################################
# This is counter which will help to iterrate trough the loop. This is simply a row counter
# I am starting row count as 2, because the first row in my case is header. So we dont need to read the header data
########################################################################################################
$intRow = 2

$a = Get-Date
write-host $a
write-host "---------------------------"

Do {

    # Reading the first column of the current row
    $TicketNumber = $UserWorksheet.Cells.Item($intRow, 1).Value()

    write-host $intRow " " $TicketNumber    

    $intRow++

} While ($UserWorksheet.Cells.Item($intRow,1).Value() -ne $null)

$a = Get-Date
write-host $a
write-host "---------------------------"

########################################################################################################
# Exiting the excel object
########################################################################################################
$objExcel.Quit()

########################################################################################################
#Release all the objects used above
########################################################################################################
$a = Release-Ref($UserWorksheet)
$a = Release-Ref($UserWorkBook) 
$a = Release-Ref($objExcel)
like image 355
ProfessionalAmateur Avatar asked Mar 20 '13 23:03

ProfessionalAmateur


People also ask

Can power automate extract data from Excel?

Use Power Automate to create a flow. Upload Excel data from OneDrive for Business. Extract text from Excel, and send it for Named Entity Recognition(NER) Use the information from the API to update an Excel sheet.

How do I parse an Excel file in power automate?

Parse Excel File in Power Automate Location: URL of the SharePoint site where the excel file is stored. Document Library: Name of the document library. File: Select the name of the file. Table: Select which table to parse form the Excel file.


1 Answers

In his blog entry Speed Up Reading Excel Files in PowerShell, Robert M. Toups, Jr. explains that while loading to PowerShell is fast, actually reading the Excel cells is very slow. On the other hand, PowerShell can read a text file very quickly, so his solution is to load the spreadsheet in PowerShell, use Excel’s native CSV export process to save it as a CSV file, then use PowerShell’s standard Import-Csv cmdlet to process the data blazingly fast. He reports that this has given him up to a 20 times faster import process!

Leveraging Toups’ code, I created an Import-Excel function that lets you import spreadsheet data very easily. My code adds the capability to select a specific worksheet within an Excel workbook, rather than just using the default worksheet (i.e. the active sheet at the time you saved the file). If you omit the –SheetName parameter, it uses the default worksheet.

function Import-Excel([string]$FilePath, [string]$SheetName = "")
{
    $csvFile = Join-Path $env:temp ("{0}.csv" -f (Get-Item -path $FilePath).BaseName)
    if (Test-Path -path $csvFile) { Remove-Item -path $csvFile }

    # convert Excel file to CSV file
    $xlCSVType = 6 # SEE: http://msdn.microsoft.com/en-us/library/bb241279.aspx
    $excelObject = New-Object -ComObject Excel.Application  
    $excelObject.Visible = $false 
    $workbookObject = $excelObject.Workbooks.Open($FilePath)
    SetActiveSheet $workbookObject $SheetName | Out-Null
    $workbookObject.SaveAs($csvFile,$xlCSVType) 
    $workbookObject.Saved = $true
    $workbookObject.Close()

     # cleanup 
    [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbookObject) |
        Out-Null
    $excelObject.Quit()
    [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelObject) |
        Out-Null
    [System.GC]::Collect()
    [System.GC]::WaitForPendingFinalizers()

    # now import and return the data 
    Import-Csv -path $csvFile
}

These supplemental functions are used by Import-Excel:

function FindSheet([Object]$workbook, [string]$name)
{
    $sheetNumber = 0
    for ($i=1; $i -le $workbook.Sheets.Count; $i++) {
        if ($name -eq $workbook.Sheets.Item($i).Name) { $sheetNumber = $i; break }
    }
    return $sheetNumber
}

function SetActiveSheet([Object]$workbook, [string]$name)
{
    if (!$name) { return }
    $sheetNumber = FindSheet $workbook $name
    if ($sheetNumber -gt 0) { $workbook.Worksheets.Item($sheetNumber).Activate() }
    return ($sheetNumber -gt 0)
}
like image 125
Michael Sorens Avatar answered Nov 16 '22 03:11

Michael Sorens