So I've built myself a super simple notetaking application using a relational database (if you're curious, I've used Excel VBA + MySQL). The app works fantastically for me as a replacement for Evernote, but I had this other feature idea: Could I implement version control/history for each individual note?
To be clear I'm not talking about version control for the database's records or schema. I'm trying to make a user-facing (not developer) interface to take notes "back in time". So yes, this could be done quite easily by simply assigning a unique ID to each note “thread” in a sense where the thread contains running history of that note, but if possible I’d also like to compress this data as much as possible and only store the differences of what changed.
So for example, if I have a note with body:
“This is the note body. It’s a super long text” And I change it to:
“This is the note body. It’s a very long text”
I would like to not store all those character bytes all over again in the database, and instead somehow store only what changed (“super” -> “very”).
This is similar to how GIT works probably except I don’t need branching capabilities. Would anybody have any suggestions for algorithms on how to do this sort of thing? Thanks!
As a first choice, I would stick to store and version entire note as a whole, even if that's just one letter changed. It makes it simple - doesn't require to compute diffs on write and recontruct note on read. Storage is cheap and MySQL performance will surely suffice with small to medium amount of data.
[notes]
note_id version text
1 1 This is the note body. It’s a super long text
1 2 This is the note body. It’s a very long text
1 3 This is the note body. It’s a really a very long text
I would only consider following options if you really expect huge number of users and notes, or maybe just doing this for educational purposes.
Instead of versioning notes as a whole you can split it into chunks - it might be paragraphs, sections or any other entity you can distinguish.
[sections]
section_id text
1 This is the note body.text
2 It’s a super long text
3 It’s a very long text
4 It’s really a very long text
[notes]
note_id version position section_id
1 1 1 1
1 1 2 2
1 2 1 1
1 2 2 3
1 3 1 1
1 3 2 4
Here notes and their versions reference to specific sections at specific postitions. See how section_id = 1 gets reused in subsequent versions. It also allows a section to be reused across different notes.
Or, as you suggested, you could try to store diffs. For example, using unified diff:
[notes]
note_id version text_or_diff
1 1 This is the note body.
It’s a super long text
1 2 @@ -1,2 +1,2 @@
This is the note body.
-It’s a super long text
+It’s a very long text
1 3 @@ -1,2 +1,2 @@
This is the note body.
-It’s a very long text
+It’s really a very long text
Here of course the diff is longer than actual text of the note, but with bigger notes it will be more efficient. As mentioned, this comes at a cost - when reading such note you need to load all version records and apply the diffs.
From here you can explore various options and optimizations:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With