I have two questions:
Is there any way to view a .docx
file on Github
? We have uploaded all of our assignments onto Github, but there is no way we can view it within the browser. It would be nice if we could view those .docx
files in the browser without downloading the file.
How can I use git diff
on the .docx
file format? I tried to use catdoc
but it didn't work for me. I think I have used git diff
on Windows for the .doc
format before, but it's not working for me on Mac.
Thanks a lot.
With Word Diff you can use Git's native cryptographic diff functionality - which ensures the authenticity and integrity of a document - to quickly verify what's changed in a given iteration, or compare different versions of the document over time, all with a single click.
You can run the git diff HEAD command to compare the both staged and unstaged changes with your last commit. You can also run the git diff <branch_name1> <branch_name2> command to compare the changes from the first branch with changes from the second branch.
GitHub is just a back end repository store that can be used to store Git data. Microsoft Word Documents have in-band markup, along with the text data, and are stored as binary files — this is true for both . doc and . docx files.
Select the folder where you want to save your document. The dialog box will open > Select "Save as" > In the "Save as type" menu > Select the option "Word document (. docx)" > Click on the "Save as" button and a copy of your file will be saved in Docx format. I hope the information is useful.
In .gitattributes use:
*.docx diff=zip
In .git/config use:
[diff "zip"]
textconv = unzip -c -a
As a bonus my settings for old word/excel and new word/excel:
In .gitattributes use:
*.doc diff=word
*.xsl diff=excel
*.xlsx diff=zip
*.docx diff=zip
In .git/config use:
[diff "word"]
textconv = strings
[diff "excel"]
textconv = strings
[diff "zip"]
textconv = unzip -c -a
Answering your second question -
Usually when you try
git diff filename.docx
you will get output of the form -
Binary files a/filename.docx and b/filename.docx differ
Not very helpful. A perfect way around that is to use Pandoc.
Create or edit file ~/.gitconfig (linux, Mac) or "c:\Documents and Settings\user.gitconfig" (Windows) to add (or use git config --global --edit
)
[diff "pandoc"]
textconv=pandoc --to=markdown
prompt = false
[alias]
wdiff = diff --word-diff=color --unified=1`
In your git controlled directory with .docx files, create or edit file .gitattributes (linux, Windows and Mac) to add
*.docx diff=pandoc
You can commit .gitattributes so that it stays for use in other computers, but you'll need to edit ~/.gitconfig in every new computer you want to use.
Now you can see a pretty coloured diff with the changes you have made to your .docx file since the last commit
git wdiff file.docx
More details can be found here.
The accepted solution (using strings / unzip ) didn't work very well for me on Linux Mint 19.3. The following seems to work pretty well for most doc/docx/rtf/xls files as well as their LibreOffice counterparts. Some of these might work on Windows via cygwin/git bash but I have not tested; if the packages I mention are not available in cygwin/git bash, then I would look for python/perl scripts that do the same conversion and substitute with those instead.
sudo apt install git pandoc catdoc odt2txt
. mkdir ~/.config/git/ && touch ~/.config/git/attributes
(on Windows this should be mkdir "%USERPROFILE%\.config\git" && echo "" > "%USERPROFILE%\.config\git\attributes"
)${projectDir}/.git/info/attributes
as desired): # handle windows *.reg files (utf-16 which git doesn't normally like)
*.reg diff=utf16
# handle misc common document formats
*.pdf diff=pdf
*.rtf diff=catdoc
# handle libre/open document formats
*.ods diff=ods2txt
*.odp diff=odp2txt
*.odt diff=odt2txt
# handle older common ms document formats
# note: ppt did not work for me
*.doc diff=catdoc
*.ppt diff=catppt
*.xls diff=xls2csv
# handle newer zipped ms document formats
# note: pptx and xlsx did not work for me
*.docx diff=pandoc
*.pptx diff=pandoc
*.xlsx diff=pandoc
~/.gitconfig
or in the project-scoped ${projectDir}/.git/config
). Much of this is based on this article but altered based on my own testing.[core]
autocrlf = false
[diff]
guitool = kdiff3
[diff "odp2txt"]
textconv = odp2txt
binary = true
[diff "odt2txt"]
textconv = odt2txt
binary = true
[diff "ods2txt"]
textconv = ods2txt
binary = true
[diff "catdoc"]
textconv = catdoc
binary = true
# note catppt did not work for me
[diff "catppt"]
textconv = catppt
binary = true
[diff "xls2csv"]
textconv = xls2csv
binary = true
[diff "xlsx2csv"]
textconv = xlsx2csv
binary = true
[diff "pandoc"]
textconv=pandoc --to=markdown
prompt = false
[diff "pdf2txt"]
textconv=pdf2txt
binary = true
[diff "utf16"]
textconv = iconv -c -f UTF-16LE -t ASCII
I was never able to successfully get diffs working for xlsx, ppt, or pptx even after downloading the latest version of pandoc from their github page. The docx conversion worked fine even with the super old version that is in the Mint/Ubuntu/Debian repos (v1.19.2.4 from 2016). For the xlsx/pptx samples I was using, I always got either "Invalid UTF-8 stream fatal" (old version) or "UTF-8 decoding error" (new version).
This could have been due to the sample files I was using (some samples from the web and some samples I created by converting LibreOffice documents), my system setup, the versions I was using or something else.
For completeness, after installing the newer pandoc, I was using:
$ uname -vipor
5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 x86_64 x86_64 GNU/Linux
$ dpkg -l catdoc odt2txt pandoc git xlsx2csv|grep '^ii'
ii catdoc 1:0.95-4.1 amd64 text extractor for MS-Office files
ii git 1:2.17.1-1ubuntu0.5 amd64 fast, scalable, distributed revision control system
ii odt2txt 0.5-1build2 amd64 simple converter from OpenDocument Text to plain text
ii pandoc 2.9.2-1 amd64 general markup converter
ii xlsx2csv 0.20+20161027+git5785081-1 all convert xslx files to csv format
EDIT: Also tried using the package xlsx2csv
for xlsx conversion instead of pandoc and I had issues with that as well. Could be something to do with my samples but since I am not really doing anything special to create them I would consider that a coverage-gap / limitation of xlsx2csv/pandoc if so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With