Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What should I do if I put MS Office (e.g. .docx) or OpenOffice( e.g. .odt) document into a git repository?

Tags:

git

github

I put several .docx, .txt and .pdf file into a .git repository. I can open, edit, save the local .docx file; however, when I push it to github, and download it back to my computer, Word complains that it cannot open it.

In order to store .docx file on github, is there some essential steps I should do to the git settings?

like image 702
Nick Avatar asked Jun 09 '15 09:06

Nick


People also ask

Can I upload DOCX to GitHub?

You can upload an existing file to a repository on GitHub.com using the command line. Tip: You can also add an existing file to a repository from the GitHub website. This procedure assumes you've already: Created a repository on GitHub, or have an existing repository owned by someone else you'd like to contribute to.

What program do I need to open a .DOCX document?

You can open a DOCX file with Microsoft Word in Windows and macOS. Word is the best option for opening DOCX files because it fully supports the formatting of Word documents, which includes images, charts, tables, and text spacing and alignment. Word is also available for Android and iOS devices.

Can I open ODT files in word?

OpenDocument (. odt) files are compatible with Word and open source applications like OpenOffice and LibreOffice, but you might see formatting differences and some Word features aren't available in . odt files.


1 Answers

Solution

Make a .gitattributes file in your working directory and add the following line to it:

*.docx    binary

Why not just set core.autocrlf=false ?

This is useful too. But configuring .docx as a binary format solves not only this problem, but also potential merge issues.

What is the origin of this problem?

From http://git-scm.com/docs/gitattributes , section "Marking files as binary". Note the italicized section.

Git usually guesses correctly whether a blob contains text or binary data by examining the beginning of the contents. However, sometimes you may want to override its decision, either because a blob contains binary data later in the file, or because the content, while technically composed of text characters, is opaque to a human reader.

.docx format is a zip folder containting xml and binary data, such as images.

Git treated your .docx as a text (and not binary) file and replaced endline characters. As Microsoft-developed format, .docx is probably using CRLF, which might have been replaced with LF in the remote repository. When you downloaded that file directly from remote, it still had LFs.

In a binary file Git never replaces endline chars, so even the files on remote repository will have proper CRLFs.

Applicable formats

This is applicable to any file format which is a zipped package with text and binary data. This includes:

  • OpenDocument: .odt, .ods, .odp and others.
  • OpenOffice.org XML: .sxw, .sxc, .sxi and others.
  • Open Packaging Conventions .docx, .xlsx, .pptx and others.
like image 77
Nick Volynkin Avatar answered Nov 06 '22 00:11

Nick Volynkin