I am trying to use VBA coding - which I am pretty new to - to obtain a series of .doc documents from PDFs (which are not images), that is, I am trying to loop over various PDF files and save them in MS Word format. My experience is that word reads pretty well the PDF documents that I have: word maintains the correct layout of the PDF file most of the time. I am not sure if this is the right choice to tackle this and I ask for an alternative suggestion -- using R, if possible.
Anyway, here it is the code which I found here:
Sub convertToWord()
Dim MyObj As Object, MySource As Object, file As Variant
file = Dir("C:\Users\username\work_dir_example" & "*.pdf") 'pdf path
Do While (file <> "")
ChangeFileOpenDirectory "C:\Users\username\work_dir_example"
Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
"", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
Format:=wdOpenFormatAuto, XMLTransform:=""
ChangeFileOpenDirectory "C:\Users\username\work_dir_example"
ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
, LockComments:=False, Password:="", AddToRecentFiles:=True, _
WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
False, CompatibilityMode:=15
ActiveDocument.Close
file = Dir
Loop
End Sub
After pasting it in the developer's window, I save the code in a module -> I close the developer's window -> I click on the "Macros" button -> I execute the "convertToWord" macro. I get the following error in a pop up box: "Sub or Function not defined". How do I fix this? Also, previously, for some reason that is not clear to me now, I got an error related to the function ChangeFileOpenDirectory
, which seemed not to be defined also.
Update 27/08/2017
I changed the code to the following:
Sub convertToWord()
Dim MyObj As Object, MySource As Object, file As Variant
file = Dir("C:\Users\username\work_dir_example" & "*.pdf")
ChDir "C:\Users\username\work_dir_example"
Do While (file <> "")
Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
"", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
Format:=wdOpenFormatAuto, XMLTransform:=""
ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
, LockComments:=False, Password:="", AddToRecentFiles:=True, _
WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
False, CompatibilityMode:=15
ActiveDocument.Close
file = Dir
Loop
End Sub
Now I do not get any error messages in a pop up box, but there is no output in my working directory. What might be wrong with it right now?
Step 1 Download and install Adobe Acrobat on your computer from the official website of Adobe. Step 2 Open a PDF file in Adobe Acrobat that you want to convert into Word without changing the format. Step 3 From the menu click on File and Export. Step 4 Now choose "Microsoft Word Document" as the text format.
If you want to learn how to add a PDF to Word and retain the ability to edit the PDF, click Insert > the arrow next to Object > Text from File. That will tell Word to create an editable version of the PDF and insert it into the document.
Any language that can read PDF files and write Word docs (which are XML) can do this, but the conversion you like (which Word does when the PDF is opened) will require using an API for the application itself. VBA is your easy option.
The snippets you've posted (and my samples below) use early binding and enumerated constants, which means we need a reference to the Word object library. That is already set up for any code you write in a Word document, so create a new Word document and add the code in a standard module. (See this Excel tutorial if you need more details, the steps for our process are the same).
You can run your macro from the VB Editor (using the Run button) or from the normal document window (click the Macros button on the View tab in Word 2010-2016). Save your document as a DOCM file if you want to reuse the macro without setting up the code again.
Now for the code!
As stated in comments, your second snippet is valid if you just ensure that your folder paths end with a backslash "\" character. It's still not great code after you fix that, but that'll get you up and running.
I'll assume you want to go the extra mile and have a well-written version of this you could repurpose or expand upon later. For simplicity, we'll use two procedures: the main conversion and a procedure to suppress the PDF conversion warning dialog (controlled by the registry).
Main procedure:
Sub ConvertPDFsToWord2()
Dim path As String
'Manually edit path in the next line before running
path = "C:\users\username\work_dir_example\"
Dim file As String
Dim doc As Word.Document
Dim regValPDF As Integer
Dim originalAlertLevel As WdAlertLevel
'Generate string for getting all PDFs with Dir command
'Check for terminal \
If Right(path, 1) <> "\" Then path = path & "\"
'Append file type with wildcard
file = path & "*.pdf"
'Get path for first PDF (blank string if no PDFs exist)
file = Dir(file)
originalAlertLevel = Application.DisplayAlerts
Application.DisplayAlerts = wdAlertsNone
If file <> "" Then regValPDF = TogglePDFWarning(1)
Do While file <> ""
'Open method will automatically convert PDF for editing
Set doc = Documents.Open(path & file, False)
'Save and close document
doc.SaveAs2 path & Replace(file, ".pdf", ".docx"), _
fileformat:=wdFormatDocumentDefault
doc.Close False
'Get path for next PDF (blank string if no PDFs remain)
file = Dir
Loop
CleanUp:
On Error Resume Next 'Ignore errors during cleanup
doc.Close False
'Restore registry value, if necessary
If regValPDF <> 1 Then TogglePDFWarning regValPDF
Application.DisplayAlerts = originalAlertLevel
End Sub
Registry setting function:
Private Function TogglePDFWarning(newVal As Integer) As Integer
'This function reads and writes the registry value that controls
'the dialog displayed when Word opens (and converts) a PDF file
Dim wShell As Object
Dim regKey As String
Dim regVal As Variant
'setup shell object and string for key
Set wShell = CreateObject("WScript.Shell")
regKey = "HKCU\SOFTWARE\Microsoft\Office\" & _
Application.Version & "\Word\Options\"
'Get existing registry value, if any
On Error Resume Next 'Ignore error if reg value does not exist
regVal = wShell.RegRead(regKey & "DisableConvertPdfWarning")
On Error GoTo 0 'Break on errors after this point
wShell.regwrite regKey & "DisableConvertPdfWarning", newVal, "REG_DWORD"
'Return original setting / registry value (0 if omitted)
If Err.Number <> 0 Or regVal = 0 Then
TogglePDFWarning = 0
Else
TogglePDFWarning = 1
End If
End Function
As others have stated, the problem seems to lie mostly with the path & file name. Here is the second version of the code you posted with some changes.
Unfortunately, a warning message pops up and setting DisplayAlerts to false will not suppress it. But if you click the "don't show this message again" checkbox the first time it pops up, then it will not continue to pop up for every file.
Sub convertToWord()
Dim MyObj As Object
Dim MySource As Object
Dim file As String
Dim path As String
path = "C:\Users\username\work_dir_example\"
file = Dir(path & "*.pdf")
Do While (file <> "")
Documents.Open FileName:=path & file
With ActiveDocument
.SaveAs2 FileName:=Replace(path & file, ".pdf", ".docx"), _
FileFormat:=wdFormatXMLDocument
.Close
End With
file = Dir
Loop
End Sub
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With