Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a Word document into usable HTML in PHP

Tags:

php

ms-word

I have a set of Word documents which I want to publish using a PHP tool I've written. I copy and paste the Word documents into a text box and then save them into MySQL using the PHP program. The problem I Have arises from all the non-standard characters that Word documents have, like curly quotes and ellipses ("..."). What I do at the moment is manually search and replace these kinds of things (and also foreign symbols such as e-acute) with either plain text or HTML entities (&eacute ; etc) Is there a function in PHP I can call that will take the output of a Word document and convert everything that should be entities into entities, and other symbols that don't display properly in Firefox into symbols that do display.

Thanks!

like image 288
Ben Avatar asked Oct 13 '08 19:10

Ben


People also ask

Can MS Word be used as HTML editor?

If you are a Microsoft Word user, you can still edit HTML files in Word, just as you would any other text-based file. This permits you to directly edit and change a HTML file without the use of a more expensive Web authoring tool.

Is PHP coding available in MS Word?

It should be noted that PHP is a server encrypting language which is used to make dynamic and interactive web pages. Special tools and editors are used to writing PHP code and MS Word is not among them. Hence, we conclude that the PHP Coding feature is not available in MS Word.


2 Answers

This has served me well in the past:

$str = mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8')
like image 191
eyelidlessness Avatar answered Oct 02 '22 15:10

eyelidlessness


A better solution would be to ensure that your database is set-up to support UTF-8 characters. The additional characters available in the extended set should cover all the "non-standard" characters that you're talking about.

Otherwise, if you really must convert these characters into HTML entities, use htmlentities().

like image 26
Richard Turner Avatar answered Oct 02 '22 16:10

Richard Turner