Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace "Shift-Enter" line break with "Enter" in word document using Microsoft office API

I have a number of word documents that will be converted to HTML. It is required the paragraphs in the word documents should be converted to <p> elements.

After some tests with the Microsoft Office API's SaveAs method to convert the documents to the HTML, I realized the paragraphs with manual line breaks (break by "Shift-Enter") couldn't be placed in a separated <p> element, instead the paragraphs are grouped in a same <p> element.

In order to separate them, I have been trying to replace the "Shift-Enter" line breaks with the "Enter"/Carriage return before doing the conversion. However, I couldn't find a suitable way to do the line break replacement job. I have tried the WdLineEndingType parameter in the SaveAs method, but it seems not effective for the issue.

like image 343
Kata Avatar asked Feb 05 '13 15:02

Kata


2 Answers

For those looking in MS Word: use Control-H {Find & replace].

Find Special character: manual Line break (^l, lowercase L)

Replace with: Paragraph mark (^p)

Replace All will do the whole document.

Edit: changed to lowercase characters.

like image 110
Alan Campbell Avatar answered Sep 29 '22 00:09

Alan Campbell


The ms-word office API provides a find function in the Range object, enabling to search and replace the strings.

The following code is to find the manual line breaks("^l") with the carriage return("^p").

Range r = oDoc.Content;
r.WholeStory();
r.Find.Execute("^l", ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, "^p", WdReplace.wdReplaceAll);

Then use SaveAs to convert the word document to HTML, it will properly place each lines in <p> elements.

like image 36
Kata Avatar answered Sep 29 '22 01:09

Kata