Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to diff PowerPoint version-controlled with git?

I have some PowerPoint documents that I keep version-controlled with git. I want to know what differences are between versions of a file. Text is most important, images and formatting not so much (at least not at this point).

like image 813
nmz787 Avatar asked Aug 27 '15 21:08

nmz787


3 Answers

I wrote this for use with git on the command-line (requires Python and the python-pptx library):

"""
Setup -- Add these lines to the following files:
--- .gitattributes
*.pptx diff=pptx

--- .gitconfig (or repo\.git\config    or your_user_home\.gitconfig) (change the path to point to your local copy of the script)
[diff "pptx"]
    binary = true
    textconv = python C:/Python27/Scripts/git-pptx-textconv.py

usage:
git diff your_powerpoint.pptx


Thanks to the  python-pptx docs and this snippet:
http://python-pptx.readthedocs.org/en/latest/user/quickstart.html#extract-all-text-from-slides-in-presentation
"""

import sys
from pptx import Presentation


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print "Usage: git-pptx-textconv file.xslx"

    path_to_presentation = sys.argv[1]

    prs = Presentation(path_to_presentation)

    for slide in prs.slides:
        for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
            for paragraph in shape.text_frame.paragraphs:
                par_text = ''
                for run in paragraph.runs:
                    s = run.text
                    s = s.replace(r"\\", "\\\\")
                    s = s.replace(r"\n", " ")
                    s = s.replace(r"\r", " ")
                    s = s.replace(r"\t", " ")
                    s = s.rstrip('\r\n')

                    # Convert left and right-hand quotes from Unicode to ASCII
                    # found http://stackoverflow.com/questions/816285/where-is-pythons-best-ascii-for-this-unicode-database
                    # go here if more power is needed  http://code.activestate.com/recipes/251871/
                    # or here                          https://pypi.python.org/pypi/Unidecode/0.04.1
                    punctuation = { 0x2018:0x27, 0x2019:0x27, 0x201C:0x22, 0x201D:0x22 }
                    s.translate(punctuation).encode('ascii', 'ignore')
                    s = s.encode('utf-8')
                    if s:
                        par_text += s
                print par_text
like image 93
nmz787 Avatar answered Nov 07 '22 11:11

nmz787


I was unable to install python-pptx, as suggested by the accepted answer, so I looked for a node.js solution (that may also work for several other file formats that it can handle).

Install https://github.com/dbashford/textract (npm install --global textract).

Define how to diff "textract" in your .git config. For my Windows machine,

[diff "textract"]
    binary = true
    textconv=textract.cmd

Define in your .gitattributes that *.pptx file should use diff "textract"

*.pptx diff=textract

git diff happily.

like image 26
xverges Avatar answered Nov 07 '22 09:11

xverges


Not really. PowerPoint file is essentially an archive (zip) of the folder full of files. Git will treat it as a binary file (cause it is).

Maybe there's a 3rd party extension to do it but I've never heard of it.

like image 39
Zepplock Avatar answered Nov 07 '22 10:11

Zepplock