I have a number of word documents that will be converted to HTML. It is required the paragraphs in the word documents should be converted to <p>
elements.
After some tests with the Microsoft Office API's SaveAs method to convert the documents to the HTML, I realized the paragraphs with manual line breaks (break by "Shift-Enter") couldn't be placed in a separated <p>
element, instead the paragraphs are grouped in a same <p>
element.
In order to separate them, I have been trying to replace the "Shift-Enter" line breaks with the "Enter"/Carriage return before doing the conversion. However, I couldn't find a suitable way to do the line break replacement job. I have tried the WdLineEndingType parameter in the SaveAs method, but it seems not effective for the issue.
For those looking in MS Word: use Control-H {Find & replace].
Find Special character: manual Line break (^l, lowercase L)
Replace with: Paragraph mark (^p)
Replace All will do the whole document.
Edit: changed to lowercase characters.
The ms-word office API provides a find function in the Range object, enabling to search and replace the strings.
The following code is to find the manual line breaks("^l") with the carriage return("^p").
Range r = oDoc.Content;
r.WholeStory();
r.Find.Execute("^l", ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, "^p", WdReplace.wdReplaceAll);
Then use SaveAs to convert the word document to HTML, it will properly place each lines in <p>
elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With