Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape a local HTML file

Tags:

dom

excel

vba

I want to open a local HTML file and store it as an HTMLDocument so I can scrape it into excel. However all the available information is for html pages on the web. So for instance this code works great for www.bbc.co.uk but doesn't work for a local file:

Sub queryXMLlocal()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Debug.Print Application.ActiveWorkbook.Path

XMLPage.Open "GET", "<filepath>\KOND.html", False
XMLPage.send

If XMLPage.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLPage.Status & " - " & XMLPage.statusText
Exit Sub
End If

End Sub

Alternatively using

Sub GetHTMLDocument()

Dim IE As New SHDocVw.internetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument


IE.Visible = True
IE.navigate "https://www.bbc.co.uk/"

Do While IE.readyState <> READYSTATE_COMPLETE
Loop

    ' Wait while IE loading...

Set HTMLDoc = IE.Document
end sub

works but when I use a local file I get the error:

"object invoked has disconnected from its client"

Can I just use HTMLdocument.open? Although I cannot get this to work either.

like image 522
tom s Avatar asked May 01 '26 03:05

tom s


1 Answers

This is the function I usually use:

Public Function GetHTMLFileContent(ByVal filePath As String) As HTMLDocument
    Dim fso As Object, hFile As Object, hString As String, html As New HTMLDocument
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set hFile = fso.OpenTextFile(filePath)

    Do Until hFile.AtEndOfStream
        hString = hFile.ReadAll()
    Loop

    html.body.innerHTML = hString
    Set GetHTMLFileContent = html
End Function
like image 86
QHarr Avatar answered May 02 '26 22:05

QHarr



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!