Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic Powershell - batch convert Word Docx to PDF

I am trying to use PowerShell to do a batch conversion of Word Docx to PDF - using a script found on this site: http://blogs.technet.com/b/heyscriptingguy/archive/2013/03/24/weekend-scripter-convert-word-documents-to-pdf-files-with-powershell.aspx

# Acquire a list of DOCX files in a folder
$Files=GET-CHILDITEM "C:\docx2pdf\*.DOCX"
$Word=NEW-OBJECT –COMOBJECT WORD.APPLICATION

Foreach ($File in $Files) {
    # open a Word document, filename from the directory
    $Doc=$Word.Documents.Open($File.fullname)

    # Swap out DOCX with PDF in the Filename
    $Name=($Doc.Fullname).replace("docx","pdf")

    # Save this File as a PDF in Word 2010/2013
    $Doc.saveas([ref] $Name, [ref] 17)  
    $Doc.close()
}

And I keep on getting this error and can't figure out why:

PS C:\docx2pdf> .\docx2pdf.ps1
Exception calling "SaveAs" with "16" argument(s): "Command failed"
At C:\docx2pdf\docx2pdf.ps1:13 char:13
+     $Doc.saveas <<<< ([ref] $Name, [ref] 17)
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : DotNetMethodException

Any ideas?

Also - how would I need to change it to also convert doc (not docX) files, as well as use the local files (files in same location as the script location)?

Sorry - never done PowerShell scripting...

like image 463
takabanana Avatar asked May 14 '13 02:05

takabanana


People also ask

How do I batch convert multiple Word documents to PDF?

Convert multiple files into a single PDF.Open your favorite web browser and navigate to Acrobat. Select Combine Files. Drag and drop your files into the conversion frame.

How do I convert Word to PDF in PowerShell?

If you want to make the function permanently available, so that the function is available every time you start PowerShell, you have to create a folder in C:\Program Files\WindowsPowerShell\Modules. Name the folder ConvertWordTo-PDF. Then copy the psm1 file in that folder.

Can you convert multiple Word documents to PDF at once for free?

Batch Convert Word to PDF with Adobe Acrobat. Step 1: Save all the Word documents that you wish to convert in one folder. Step 2: Open Adobe Acrobat and select 'Create PDF' to begin the batch convert Word to PDF progress. Step 3: Choose 'Multiple Files' > 'Create Multiple PDF Files'.

How do I convert a DOCX file to PDF in Word?

Drag and drop a Microsoft Word document (DOCX or DOC) to convert to PDF. Select a Microsoft Word document (DOCX or DOC) to convert to PDF. Drag and drop a Microsoft Word document (DOCX or DOC) to convert to PDF. Your file will be uploaded to Adobe cloud storage.


4 Answers

This will work for doc as well as docx files.

$documents_path = 'c:\doc2pdf'

$word_app = New-Object -ComObject Word.Application

# This filter will find .doc as well as .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {

    $document = $word_app.Documents.Open($_.FullName)

    $pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"

    $document.SaveAs([ref] $pdf_filename, [ref] 17)

    $document.Close()
}

$word_app.Quit()
like image 96
MFT Avatar answered Oct 25 '22 23:10

MFT


The above answers all fell short for me, as I was doing a batch job converting around 70,000 word documents this way. As it turns out, doing this repeatedly eventually leads to Word crashing, presumably due to memory issues (the error was some COMException that I didn't know how to parse). So, my hack to get it to proceed was to kill and restart word every 100 docs (arbitrarily chosen number).

Additionally, when it did crash occasionally, there would be resulting malformed pdfs, each of which were generally 1-2 kb in size. So, when skipping already generated pdfs, I make sure they are at least 3kb in size. If you don't want to skip already generated PDFs, you can delete that if statement.

Excuse me if my code doesn't look good, I don't generally use Windows and this was a one-off hack. So, here's the resulting code:

$Files=Get-ChildItem -path '.\path\to\docs' -recurse -include "*.doc*"

$counter = 0
$filesProcessed = 0
$Word = New-Object -ComObject Word.Application

Foreach ($File in $Files) {
    $Name="$(($File.FullName).substring(0, $File.FullName.lastIndexOf("."))).pdf"
    if ((Test-Path $Name) -And (Get-Item $Name).length -gt 3kb) {
        echo "skipping $($Name), already exists"
        continue
    }

    echo "$($filesProcessed): processing $($File.FullName)"
    $Doc = $Word.Documents.Open($File.FullName)
    $Doc.SaveAs($Name, 17)
    $Doc.Close()
    if ($counter -gt 100) {
        $counter = 0
        $Word.Quit()
        [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
        $Word = New-Object -ComObject Word.Application
    }
    $counter = $counter + 1
    $filesProcessed = $filesProcessed + 1
}
like image 35
osdiab Avatar answered Oct 25 '22 21:10

osdiab


Neither of the solutions posted here worked for me on Windows 8.1 (btw. I'm using Office 365). My PowerShell somehow does not like the [ref] arguments (I don't know why, I use PowerShell very rarely).

This is the solution that worked for me:

$Files=Get-ChildItem 'C:\path\to\files\*.docx'

$Word = New-Object -ComObject Word.Application

Foreach ($File in $Files) {
    $Doc = $Word.Documents.Open($File.FullName)
    $Name=($Doc.FullName).replace('docx', 'pdf')
    $Doc.SaveAs($Name, 17)
    $Doc.Close()
}
like image 20
Honza Kalfus Avatar answered Oct 25 '22 21:10

Honza Kalfus


This works for me (Word 2007):

$wdFormatPDF = 17
$word = New-Object -ComObject Word.Application
$word.visible = $false

$folderpath = Split-Path -parent $MyInvocation.MyCommand.Path

Get-ChildItem -path $folderpath -recurse -include "*.doc" | % {
    $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
    $doc = $word.documents.open($_.fullname)
    $doc.saveas($path, $wdFormatPDF) 
    $doc.close()
}

$word.Quit()
like image 28
David Brabant Avatar answered Oct 25 '22 21:10

David Brabant