Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform better document version control on Excel files and SQL schema files

I am in charge of several Excel files and SQL schema files. How should I perform better document version control on these files?

I need to know the part modified (different part) in these files and keep all the versions for reference. Currently I am appending the time stamp on the file name, but I found it seemed to be inefficient.

Is there a way or good practice to do better document version control?

By the way, editors send me the files via email.

like image 1000
Marcus Thornton Avatar asked Jun 13 '13 09:06

Marcus Thornton


People also ask

What is the better way to organize your data in a workbook?

Data organization guidelinesPut similar items in the same column Design the data so that all rows have similar items in the same column. Keep a range of data separate Leave at least one blank column and one blank row between a related data range and other data on the worksheet.

How do you make Excel documents look better?

Give your data some space Well, add some white space to your spreadsheet by giving your data extra room by adjusting the column width and height. I know it's tempting to use the Autofit command but it tends to shrink the cell width too much. Instead, set your column widths manually.

Can you use git for Excel files?

Although Git works best with text files, Excel spreadsheets are not beyond its capabilities. The standard setup of git means that *. xlsx files are viewed as binary files.


2 Answers

The answer I have written here can be applied in this case. A tool called xls2txt can provide human-readable output from .xls files. So in short, you should put this to your .gitattributes file:

*.xls diff=xls 

And in the .git/config:

[diff "xls"]     binary = true     textconv = /path/to/xls2txt 

Of course, I'm sure you can find similar tools for other file types as well, making git diff a very useful tool for office documents. This is what I currently have in my global .gitconfig:

[diff "xls"]     binary = true     textconv = /usr/bin/py_xls2txt [diff "pdf"]     binary = true     textconv = /usr/bin/pdf2txt [diff "doc"]     binary = true     textconv = /usr/bin/catdoc [diff "docx"]     binary = true     textconv = /usr/bin/docx2txt 

The Pro Git book has a good chapter on the subject: 8.2 Customizing Git - Git Attributes

like image 150
1615903 Avatar answered Sep 16 '22 19:09

1615903


Since you've tagged your question with git I assume you are asking about Git usage for this.

Well, SQL dumps are normal text files so it makes perfect sense to track them with Git. Just create a repository and store them in it. When you get a new version of a file, simply overwrite it and commit, Git will figure out everything for you, and you'll be able to see modification dates, checkout specific versions of this file and compare different versions.

The same is true for .xlsx if you decompress them. .xlsx files are zipped up directories of XML files (See How to properly assemble a valid xlsx file from its internal sub-components?). Git will view them as binary unless decompressed. It is possible to unzip the .xlsx and track the changes to the individual XML files inside of the archive.

You could also do this with .xls files, but the problem here is that .xls format is binary, so you can't get meaningful diffs from it. But you'll still be able to see modification history and checkout specific versions.

like image 36
kirelagin Avatar answered Sep 17 '22 19:09

kirelagin