I have a set of Word documents which I want to publish using a PHP tool I've written. I copy and paste the Word documents into a text box and then save them into MySQL using the PHP program. The problem I Have arises from all the non-standard characters that Word documents have, like curly quotes and ellipses ("..."). What I do at the moment is manually search and replace these kinds of things (and also foreign symbols such as e-acute) with either plain text or HTML entities (é ; etc) Is there a function in PHP I can call that will take the output of a Word document and convert everything that should be entities into entities, and other symbols that don't display properly in Firefox into symbols that do display.
Thanks!
If you are a Microsoft Word user, you can still edit HTML files in Word, just as you would any other text-based file. This permits you to directly edit and change a HTML file without the use of a more expensive Web authoring tool.
It should be noted that PHP is a server encrypting language which is used to make dynamic and interactive web pages. Special tools and editors are used to writing PHP code and MS Word is not among them. Hence, we conclude that the PHP Coding feature is not available in MS Word.
This has served me well in the past:
$str = mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8')
A better solution would be to ensure that your database is set-up to support UTF-8 characters. The additional characters available in the extended set should cover all the "non-standard" characters that you're talking about.
Otherwise, if you really must convert these characters into HTML entities, use htmlentities().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With