I have an Excel sheet with cells containing html. How can I batch convert them to plaintext? At the moment there are so many useless tags and styles. I want to write it from scratch but it will be far easier if I can get the plain text out.
I can write a script to convert html to plain text in PHP so if you can't think of a solution in VBA then maybe you can sugest how I might pass the cells data to a website and retrieve the data back.
Yes! I managed to solve my problem as well. Thanks everybody/
In my case, I had this sort of input:
<p>Lorem ipsum dolor sit amet.</p>
<p>Ut enim ad minim veniam.</p>
<p>Duis aute irure dolor in reprehenderit.</p>
And I did not want the result to be all jammed together without breaklines.
So I first splitted my input for every <p>
tag into an array 'paragraphs', then for each element I used Tim's answer to get the text out of html (very sweet answer btw).
In addition I concatenated each cleaned 'paragraph' with this breakline character Crh(10)
for VBA/Excel.
The final code is:
Public Function HtmlToText(ByVal sHTML As String) As String
Dim oDoc As HTMLDocument
Dim result As String
Dim paragraphs() As String
If IsNull(sHTML) Then
HtmlToText = ""
Exit Function
End If
result = ""
paragraphs = Split(sHTML, "<p>")
For Each paragraph In paragraphs
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = paragraph
result = result & Chr(10) & Chr(10) & oDoc.body.innerText
Next paragraph
HtmlToText = result
End Function
A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.
Function StripTags(ByVal html As String) As String
Dim text As String
Dim accumulating As Boolean
Dim n As Integer
Dim c As String
text = ""
accumulating = True
n = 1
Do While n <= Len(html)
c = Mid(html, n, 1)
If c = "<" Then
accumulating = False
ElseIf c = ">" Then
accumulating = True
Else
If accumulating Then
text = text & c
End If
End If
n = n + 1
Loop
StripTags = text
End Function
This can leave lots of extraneous whitespace, but it will help in removing the tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With