Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you programmatically (or with a tool) convert .MHT mhtml files to regular HTML and CSS files?

Tags:

Many tools have a way to export a .MHT file. I want a way to convert that single file to a collection of files, an HTML file, the relevant images, and CSS files, that I could then upload to a webhost and be consumable by all browsers. Does anybody know any tools or libraries or algorithms to do this.

like image 737
klumsy Avatar asked Apr 24 '13 22:04

klumsy


People also ask

Can you convert Mhtml to HTML?

To convert MHTML to HTML format, simply drag and drop a MHTML file into the data upload area, specify the conversion options, click the 'Convert' button, and get your output HTML file in seconds.

How do I open an MHT file in HTML?

How to Open MHT Files. Probably the easiest way to open MHT files is to use a web browser like Chrome, Opera, Edge, or Internet Explorer. You can also view one in Microsoft Word and WPS Writer. HTML editors support the format as well, like WizHtmlEditor and BlockNote.

What is the difference between HTML and MHTML?

Any MHTML document utilizes an underlying HTML Web page, but only MHTML can package an entire Web page into a single file when downloaded. In contrast, when saving an HTML document, a computer creates a folder that includes multiple files that were embedded within the original HTML document.

Which app is for MHTML files?

MHT, MHTM and MHTML file viewer is an application that provides simple and easy way to preview websites saved for offline reading as well as MHTML downloaded files on android devices including Samsung, Motorola, Asus, Nokia, LG, Xiaomi, Meizu and Huawei phones and tablets.


2 Answers

Well, you can open the .MHT file in IE and the Save it as a a web page. I tested this with this page, and even though it looked odd in IE (it's IE after all), it saved and then opened fine in Chrome (as in, it looked like it should).

Barring that method, looking at the file itself, text blocks are saved in the file as-is, and all other content is saved in Base64. Each item of content is preceded by:

[Boundary] Content-Type: [Mime Type] Content-Transfer-Encoding: [Encoding Type] Content-Location: [Full path of content] 

Where [Mime Type], [Encoding Type], and [Full path of content] are variable. [Encoding Type] appears to be either base64 or quoted-printable. [Boundary] is defined in the beginning of the .MHT file like so:

From: <Saved by WebKit> Subject: converter - How can you programmatically (or with a tool) convert .MHT mhtml        files to regular HTML and CSS files? - Stack Overflow Date: Fri, 9 May 2013 13:53:36 -0400 MIME-Version: 1.0 Content-Type: multipart/related;     type="text/html";     boundary="----=_NextPart_000_0C08_58653ABB.B67612B7" 

Using that, you could make your own file parser if needed.

like image 93
XNargaHuntress Avatar answered Sep 30 '22 19:09

XNargaHuntress


Besides IE and MS Word, there's an open-source cross-platform program called 'mht2html' first written in 2007 and last updated in 2016. It has both a GUI and terminal interface.

  • Official website
  • Sourceforge Project

I haven't tested it yet but it seems to have received good reviews.

like image 41
sahwar Avatar answered Sep 30 '22 18:09

sahwar