Word 2007 saves its documents in .docx format which is really a zip file with a bunch of stuff in it including an xml file with the document.
I want to be able to take a .docx file and drop it into a folder in my asp.net web app and have the code open the .docx file and render the (xml part of the) document as a web page.
I've been searching the web for more information on this but so far haven't found much. My questions are:
Thanks!
Try this post? I don't know but might be what you are looking for.
I wrote mammoth.js, which is a JavaScript library that converts docx files to HTML. If you want to do the rendering server-side in .NET, there is also a .NET version of Mammoth available on NuGet.
Mammoth tries to produce clean HTML by looking at semantic information -- for instance, mapping paragraph styles in Word (such as Heading 1
) to appropriate tags and style in HTML/CSS (such as <h1>
). If you want something that produces an exact visual copy, then Mammoth probably isn't for you. If you have something that's already well-structured and want to convert that to tidy HTML, Mammoth might do the trick.
Word 2007 has an API that you can use to convert to HTML. Here's a post that talks about it http://msdn.microsoft.com/en-us/magazine/cc163526.aspx. You can find documentation around the API, but I remember that there is a convert to HTML function in the API.
This code will helps to convert .docx
file to text
function read_file_docx($filename){
$striped_content = '';
$content = '';
if(!$filename || !file_exists($filename)) { echo "sucess";}else{ echo "not sucess";}
$zip = zip_open($filename);
if (!$zip || is_numeric($zip)) return false;
while ($zip_entry = zip_read($zip)) {
if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
if (zip_entry_name($zip_entry) != "word/document.xml") continue;
$content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
zip_entry_close($zip_entry);
}// end while
zip_close($zip);
//echo $content;
//echo "<hr>";
//file_put_contents('1.xml', $content);
$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
$content = str_replace('</w:r></w:p>', "\r\n", $content);
//header("Content-Type: plain/text");
$striped_content = strip_tags($content);
$striped_content = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$striped_content);
echo nl2br($striped_content);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With