Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop over PDF files and transform them into doc with word

Tags:

ms-word

pdf

vba

I am trying to use VBA coding - which I am pretty new to - to obtain a series of .doc documents from PDFs (which are not images), that is, I am trying to loop over various PDF files and save them in MS Word format. My experience is that word reads pretty well the PDF documents that I have: word maintains the correct layout of the PDF file most of the time. I am not sure if this is the right choice to tackle this and I ask for an alternative suggestion -- using R, if possible.

Anyway, here it is the code which I found here:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf") 'pdf path

   Do While (file <> "")

   ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

          Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

    ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

    ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

After pasting it in the developer's window, I save the code in a module -> I close the developer's window -> I click on the "Macros" button -> I execute the "convertToWord" macro. I get the following error in a pop up box: "Sub or Function not defined". How do I fix this? Also, previously, for some reason that is not clear to me now, I got an error related to the function ChangeFileOpenDirectory, which seemed not to be defined also.

Update 27/08/2017

I changed the code to the following:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf")

   ChDir "C:\Users\username\work_dir_example"

   Do While (file <> "")

        Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

        ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

Now I do not get any error messages in a pop up box, but there is no output in my working directory. What might be wrong with it right now?

like image 833
John Doe Avatar asked Aug 25 '17 22:08

John Doe


People also ask

How do I convert PDF to Word without messing up?

Step 1 Download and install Adobe Acrobat on your computer from the official website of Adobe. Step 2 Open a PDF file in Adobe Acrobat that you want to convert into Word without changing the format. Step 3 From the menu click on File and Export. Step 4 Now choose "Microsoft Word Document" as the text format.

How do I insert a PDF into Word and keep formatting?

If you want to learn how to add a PDF to Word and retain the ability to edit the PDF, click Insert > the arrow next to Object > Text from File. That will tell Word to create an editable version of the PDF and insert it into the document.


2 Answers

Any language that can read PDF files and write Word docs (which are XML) can do this, but the conversion you like (which Word does when the PDF is opened) will require using an API for the application itself. VBA is your easy option.

The snippets you've posted (and my samples below) use early binding and enumerated constants, which means we need a reference to the Word object library. That is already set up for any code you write in a Word document, so create a new Word document and add the code in a standard module. (See this Excel tutorial if you need more details, the steps for our process are the same).

You can run your macro from the VB Editor (using the Run button) or from the normal document window (click the Macros button on the View tab in Word 2010-2016). Save your document as a DOCM file if you want to reuse the macro without setting up the code again.

Now for the code!

As stated in comments, your second snippet is valid if you just ensure that your folder paths end with a backslash "\" character. It's still not great code after you fix that, but that'll get you up and running.

I'll assume you want to go the extra mile and have a well-written version of this you could repurpose or expand upon later. For simplicity, we'll use two procedures: the main conversion and a procedure to suppress the PDF conversion warning dialog (controlled by the registry).

Main procedure:

Sub ConvertPDFsToWord2()
    Dim path As String
    'Manually edit path in the next line before running
    path = "C:\users\username\work_dir_example\"

    Dim file As String
    Dim doc As Word.Document
    Dim regValPDF As Integer
    Dim originalAlertLevel As WdAlertLevel

'Generate string for getting all PDFs with Dir command
    'Check for terminal \
    If Right(path, 1) <> "\" Then path = path & "\"
    'Append file type with wildcard
    file = path & "*.pdf"

    'Get path for first PDF (blank string if no PDFs exist)
    file = Dir(file)

    originalAlertLevel = Application.DisplayAlerts
    Application.DisplayAlerts = wdAlertsNone

    If file <> "" Then regValPDF = TogglePDFWarning(1)

    Do While file <> ""
        'Open method will automatically convert PDF for editing
        Set doc = Documents.Open(path & file, False)

        'Save and close document
        doc.SaveAs2 path & Replace(file, ".pdf", ".docx"), _
                    fileformat:=wdFormatDocumentDefault
        doc.Close False

        'Get path for next PDF (blank string if no PDFs remain)
        file = Dir
    Loop

CleanUp:
    On Error Resume Next 'Ignore errors during cleanup
    doc.Close False
    'Restore registry value, if necessary
    If regValPDF <> 1 Then TogglePDFWarning regValPDF
    Application.DisplayAlerts = originalAlertLevel

End Sub

Registry setting function:

Private Function TogglePDFWarning(newVal As Integer) As Integer
'This function reads and writes the registry value that controls
'the dialog displayed when Word opens (and converts) a PDF file
    Dim wShell As Object
    Dim regKey As String
    Dim regVal As Variant

    'setup shell object and string for key
    Set wShell = CreateObject("WScript.Shell")
    regKey = "HKCU\SOFTWARE\Microsoft\Office\" & _
             Application.Version & "\Word\Options\"

    'Get existing registry value, if any
    On Error Resume Next 'Ignore error if reg value does not exist
    regVal = wShell.RegRead(regKey & "DisableConvertPdfWarning")
    On Error GoTo 0      'Break on errors after this point

    wShell.regwrite regKey & "DisableConvertPdfWarning", newVal, "REG_DWORD"

    'Return original setting / registry value (0 if omitted)
    If Err.Number <> 0 Or regVal = 0 Then
        TogglePDFWarning = 0
    Else
        TogglePDFWarning = 1
    End If

End Function
like image 88
AjimOthy Avatar answered Oct 20 '22 16:10

AjimOthy


As others have stated, the problem seems to lie mostly with the path & file name. Here is the second version of the code you posted with some changes.

Unfortunately, a warning message pops up and setting DisplayAlerts to false will not suppress it. But if you click the "don't show this message again" checkbox the first time it pops up, then it will not continue to pop up for every file.

Sub convertToWord()

    Dim MyObj       As Object
    Dim MySource    As Object
    Dim file        As String
    Dim path        As String

    path = "C:\Users\username\work_dir_example\"
    file = Dir(path & "*.pdf")

    Do While (file <> "")
        Documents.Open FileName:=path & file
        With ActiveDocument
            .SaveAs2 FileName:=Replace(path & file, ".pdf", ".docx"), _
                                FileFormat:=wdFormatXMLDocument
            .Close
        End With
        file = Dir
    Loop

End Sub
like image 32
J. Garth Avatar answered Oct 20 '22 15:10

J. Garth