Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Version control for prose

It seems that someone has must have done this already, but I cannot find the end product I'm looking for.

Using a version control system for text is laborious. You need newline characters at the end of each sentence, and even in the midst of long sentences. Looking at the git source, it seems that by changing a few routines that check for '\n', it should be possible to have git (or any other version control system) match '\n' or the pattern '\\.\s'. It is, however, a task that needs to be done meticulously, or I can see things breaking pretty badly.

Does anyone know someone that has already done this? Or any other alternatives?

Thanks!

like image 366
dgorur Avatar asked Oct 14 '11 22:10

dgorur


People also ask

What should be version controlled?

Version control should be used where more than one version of a document exists, or where this is likely to be the case in the future. finalised version is complete. This would be titled version 1.0. If version 1.0 is to be revised, drafts would be numbered as 1.1, 1.2, etc.

What is a version control process?

The version control process details how software's source code changes over time. With the proliferation of Git, Mercurial and other options, distributed Version control systems (DVCSes) surpassed the centralized approach, in which developers make changes to a single copy of code stored on a server.


1 Answers

Any version control system should be able to handle prose. The question is how efficiently it can do so.

The git diff command uses something like diff -u to display the differences between two versions of a file. If the file consists of text with very long lines (i.e., many characters between '\n' characters), then it might have some difficulty displaying the differences meaningfully; it might show two 5000-character lines with only a single character change.

But that doesn't necessarily imply that that's how git stores the files. I'm not intimately familiar with git's internal storage format, but my understanding is that it does reasonably well with binary files, which could have many megabytes of data with no '\n' characters.

Note that some older version control systems (SCCS, RCS) probably do store differences between versions on a line-by-line basis. But even for such systems, at worst you'd be storing a full copy of each version plus some overhead. The system should still be able to work properly.

Note that git diff --word-diff should at least partially work around the problem of comparing versions.

like image 73
Keith Thompson Avatar answered Oct 16 '22 07:10

Keith Thompson