Extract text from doc and docx

Tags:

I would like to know how can I read the contents of a doc or docx. I'm using a Linux VPS and PHP, but if there is a simpler solution using other language, please let me know, as long as it works under a linux webserver.

380

asked Apr 04 '11 15:04

Alexandre Mota

1 Answers

Here i have added the solution to get the text from .doc,.docx word files

How to extract text from word file .doc,docx php

For .doc

private function read_doc() {
    $fileHandle = fopen($this->filename, "r");
    $line = @fread($fileHandle, filesize($this->filename));   
    $lines = explode(chr(0x0D),$line);
    $outtext = "";
    foreach($lines as $thisline)
      {
        $pos = strpos($thisline, chr(0x00));
        if (($pos !== FALSE)||(strlen($thisline)==0))
          {
          } else {
            $outtext .= $thisline." ";
          }
      }
     $outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$outtext);
    return $outtext;
}

For .docx

private function read_docx(){

        $striped_content = '';
        $content = '';

        $zip = zip_open($this->filename);

        if (!$zip || is_numeric($zip)) return false;

        while ($zip_entry = zip_read($zip)) {

            if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

            if (zip_entry_name($zip_entry) != "word/document.xml") continue;

            $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

            zip_entry_close($zip_entry);
        }// end while

        zip_close($zip);

        $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
        $content = str_replace('</w:r></w:p>', "\r\n", $content);
        $striped_content = strip_tags($content);

        return $striped_content;
    }

185

answered Oct 05 '22 19:10

M Khalid Junaid

Related questions
                            
                                PHP capitalize after dash [duplicate]
                            
                                Gzip compression through .htaccess not working
                            
                                How to correct double-encoded UTF-8 strings sitting in MySQL utf8_general_ci fields?
                            
                                How to check if a file exists under include path?
                            
                                Static and Non-Static Calling in PHP
                            
                                PHP readfile() causing corrupt file downloads
                            
                                Custom error message using CodeIgniter Form Validation
                            
                                Allow access to PHP file only through ajax on local server
                            
                                php curl: SSL_VERIFYPEER option doesn't have effect
                            
                                Laravel 5 - Change model file location
                            
                                Manually add item to existing object [Laravel 5]
                            
                                How to make "php -S" to work on local network?
                            
                                Query Laravel Select WhereIn Array
                            
                                PhpSpreadsheet set background color of cell to white
                            
                                parse an XML with SimpleXML which has multiple namespaces [duplicate]
                            
                                Add 13 hours to a timestamp
                            
                                Sanitize $_GET parameters to avoid XSS and other attacks
                            
                                No xdebug in phpinfo()
                            
                                best way to check a empty array?
                            
                                How to run Cronjobs more often than once per minute?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract text from doc and docx

Tags:

linux

php

vps

docx

doc