Is it possible to extract Meta information from MS office files and/or PDFs with PHP?

Tags:

So I have files....

.doc
.docx
.xls
.xlsx
and .pdf

that are on the my server.

Is it possible (and if it is, how) to extract the meta data from those files using PHP? I'm looking for things like Author, keywords, title, etc...

In office documents it's the information stored along with the document properties (File...Properties...Summary for 2003, Prepare...Properties for 2007).

In PDFs it's information found in Document Properties.

This is not on a Windows server.

839

asked Jan 19 '10 18:01

Jason

1 Answers

I have managed to extract a lot of Meta information using XPDF on a linux system a few years back. Nowadays, though, I would say Zend_PDF is your best bet. Haven't used it myself but looks good and promises everything you need. Seems to have no library dependencies, either.

For Word .DOCs, if you don't find a better way, plug into an OpenOffice server instance / command line and convert the files to ODT, which is XML and parseable. If it's not possible to extract the meta data per Macro - it should be, but I don't know how much work it is. This OpenOffice Forum entry gives a ton of starting points for automated conversion.

The ...X formats are some sort of XML, so it should be easily possible to fetch the meta data from them. Alternatively, you should be able to use OpenOffice's conversion filters here as well, if they transport the meta data.

102

answered Oct 25 '22 08:10

Pekka

Related questions
                            
                                The best way to insert an integer into my table using previous/next buttons
                            
                                Wordpress: Archive page with Filter doesn't work (ACF)
                            
                                How can I use regex to catch unquoted array indices in PHP code and quote them?
                            
                                laravel 5.8.* installation failed
                            
                                VSCode php-swagger 3.0 annotation plugins?
                            
                                cURL randomly throws "curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL" after some failed requests
                            
                                Silence PHP 7.4.8 server request logs "Accepted", "Closing"
                            
                                Laravel class property not returning in response
                            
                                How can I get PHP to sign an input exactly the same as C#?
                            
                                Laravel not generating code coverage report
                            
                                Prevent WordPress' default input sanitization
                            
                                Linux users and groups for a LAMP server
                            
                                How to use swig to generate php interface for c++ so
                            
                                Integrate CodeIgniter with phpBB3 User System
                            
                                Dynamically generating a word cloud?
                            
                                How to bootstrap Zend_Test_PHPUnit_ControllerTestCase with Zend_Application?
                            
                                How to use special chars on aspell custom dictionary?
                            
                                Passing parameters to controller's constructor
                            
                                PHP function or file to run before and after every request

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to extract Meta information from MS office files and/or PDFs with PHP?

Tags:

php

pdf

metadata

ms-office

Jason

People also ask

1 Answers

Pekka

Recent Activity

Donate For Us