Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading docx (Office Open XML) in PHP

I want to add an word import function to our CMS, the only problem I cannot seems to find a good library for reading docx files (Word 2007).

Do anyone has some recommendations, the library should be able to extract content of the document and basic styling like italic, bold, superscript?

Thanks for your help

like image 960
RageZ Avatar asked Oct 01 '09 02:10

RageZ


1 Answers

docx files are actually just containers for the document's XML. You should be able to unzip the docx file and then go to the word folder inside, then to the document.xml. This has the actual text. But things like the fonts and styles are in other xml files in the docx container, so you'll probably want to mess around a bit and figure out what is what and how to match it up (start by using namespaces, I bet).

But yea, unzip the file, then use simplexml to convert it into something you can actually mess around with.

like image 136
Anthony Avatar answered Oct 20 '22 00:10

Anthony