I have an Excel sheet with cells containing html. How can I batch convert them to plaintext? At the moment there are so many useless tags and styles. I want to write it from scratch but it will be far easier if I can get the plain text out.
I can write a script to convert html to plain text in PHP so if you can't think of a solution in VBA then maybe you can sugest how I might pass the cells data to a website and retrieve the data back.
Yes! I managed to solve my problem as well. Thanks everybody/
In my case, I had this sort of input:
<p>Lorem ipsum dolor sit amet.</p>
<p>Ut enim ad minim veniam.</p>
<p>Duis aute irure dolor in reprehenderit.</p>
And I did not want the result to be all jammed together without breaklines.
So I first splitted my input for every <p> tag into an array 'paragraphs', then for each element I used Tim's answer to get the text out of html (very sweet answer btw). 
In addition I concatenated each cleaned 'paragraph' with this breakline character Crh(10) for VBA/Excel. 
The final code is:
Public Function HtmlToText(ByVal sHTML As String) As String
    Dim oDoc As HTMLDocument
    Dim result As String
    Dim paragraphs() As String
    If IsNull(sHTML) Then
      HtmlToText = ""
      Exit Function
    End If
    result = ""
    paragraphs = Split(sHTML, "<p>")
    For Each paragraph In paragraphs
        Set oDoc = New HTMLDocument
        oDoc.body.innerHTML = paragraph
        result = result & Chr(10) & Chr(10) & oDoc.body.innerText
    Next paragraph
    HtmlToText = result
End Function
                        A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.
Function StripTags(ByVal html As String) As String
    Dim text As String
    Dim accumulating As Boolean
    Dim n As Integer
    Dim c As String
    text = ""
    accumulating = True
    n = 1
    Do While n <= Len(html)
        c = Mid(html, n, 1)
        If c = "<" Then
            accumulating = False
        ElseIf c = ">" Then
            accumulating = True
        Else
            If accumulating Then
                text = text & c
            End If
        End If
        n = n + 1
    Loop
    StripTags = text
End Function
This can leave lots of extraneous whitespace, but it will help in removing the tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With