Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can doc/docx files be converted to markdown or structured text?

Is there a program or workflow to convert .doc or .docx files to Markdown or similar text?

PS: Ideally, I would welcome the option that a specific font (e.g. consolas) in the MS Word document will be rendered to text-code: ```....```.

like image 386
Lorenz Lo Sauer Avatar asked May 05 '13 09:05

Lorenz Lo Sauer


People also ask

How do I convert a Word document to Markdown?

Save a Word Document as a Markdown FileSave the file with the Save As… command. In the dialog box, enter your file name and select Markdown from the dropdown for Save as type.

How is a DOCX file structured?

Package Structure A WordprocessingML or docx file is a zip file (a package) containing a number of "parts"--typically UTF-8 or UTF-16 encoded XML files, though strictly defined, a part is a stream of bytes. The package may also contain other media files, such as images and video.

Can you use Markdown in MS Word?

Markdown is a lightweight markup language with plain text formatting syntax. Docs supports CommonMark compliant Markdown parsed through the Markdig parsing engine. Docs also supports custom Markdown extensions that provide richer content on the Docs site.


1 Answers

Pandoc supports conversion from docx to markdown directly:

pandoc -f docx -t markdown foo.docx -o foo.markdown 

Several markdown formats are supported:

-t gfm (GitHub-Flavored Markdown)   -t markdown_mmd (MultiMarkdown)   -t markdown (pandoc’s extended Markdown)   -t markdown_strict (original unextended Markdown)   -t markdown_phpextra (PHP Markdown Extra)   -t commonmark (CommonMark Markdown)   
like image 95
massives Avatar answered Sep 20 '22 06:09

massives