Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying a diff-patch to a string/file

For an offline-capable smartphone app, I'm creating a one-way text sync for Xml files. I'd like my server to send the delta/difference (e.g. a GNU diff-patch) to the target device.

This is the plan:

Time = 0
Server: has version_1 of Xml file (~800 kiB)
Client: has version_1 of Xml file (~800 kiB)

Time = 1
Server: has version_1 and version_2 of Xml file (each ~800 kiB)
        computes delta of these versions (=patch) (~10 kiB) 
        sends patch to Client (~10 kiB transferred)

Client: computes version_2 from version_1 and patch  <= this is the problem =>

Is there a Ruby library that can do this last step to apply a text patch to files/strings? The patch can be formatted as required by the library.

Thanks for your help!

(I'm using the Rhodes Cross-Platform Framework, which uses Ruby as programming language.)

like image 434
Felix Alcala Avatar asked Apr 16 '11 16:04

Felix Alcala


People also ask

How do I patch a diff file?

The "diff" tool calculates the differences between two text files. That difference is called a patch. You can apply a patch to another file using the "patch" tool. diff and patch are intended to be used on text files.

How do I use diff files?

Applying a DIFF File in the Command LineCopy the DIFF files to the root directory of your store. Open the terminal on the server or access the server remotely via SSH. Replace /path/to/cscart/root/directory with the actual path to the root directory of your store. Replace example.


2 Answers

Your first task is to choose a patch format. The hardest format for humans to read (IMHO) turns out to be the easiest format for software to apply: the ed(1) script. You can start off with a simple /usr/bin/diff -e old.xml new.xml to generate the patches; diff(1) will produce line-oriented patches but that should be fine to start with. The ed format looks like this:

36a
    <tr><td class="eg" style="background: #182349;">&nbsp;</td><td><tt>#182349</tt></td></tr>
.
34c
    <tr><td class="eg" style="background: #66ccff;">&nbsp;</td><td><tt>#xxxxxx</tt></td></tr>
.
20,23d

The numbers are line numbers, line number ranges are separated with commas. Then there are three single letter commands:

  • a: add the next block of text at this position.
  • c: change the text at this position to the following block. This is equivalent to a d followed by an a command.
  • d: delete these lines.

You'll also notice that the line numbers in the patch go from the bottom up so you don't have to worry about changes messing up the lines numbers in subsequent chunks of the patch. The actual chunks of text to be added or changed follow the commands as a sequence of lines terminated by a line with a single period (i.e. /^\.$/ or patch_line == '.' depending on your preference). In summary, the format looks like this:

[line-number-range][command]
[optional-argument-lines...]
[dot-terminator-if-there-are-arguments]

So, to apply an ed patch, all you need to do is load the target file into an array (one element per line), parse the patch using a simple state machine, call Array#insert to add new lines and Array#delete_at to remove them. Shouldn't take more than a couple dozen lines of Ruby to write the patcher and no library is needed.

If you can arrange your XML to come out like this:

<tag>
blah blah
</tag>
<other-tag x="y">
mumble mumble
</other>

rather than:

<tag>blah blah</tag><other-tag x="y">mumble mumble</other>

then the above simple line-oriented approach will work fine; the extra EOLs aren't going to cost much space so go for easy implementation to start.

There are Ruby libraries for producing diffs between two arrays (google "ruby algorithm::diff" to start). Combining a diff library with an XML parser will let you produce patches that are tag-based rather than line-based and this might suit you better. The important thing is the choice of patch formats, once you choose the ed format (and realize the wisdom of the patch working from the bottom to the top) then everything else pretty much falls into place with little effort.

like image 77
mu is too short Avatar answered Sep 27 '22 20:09

mu is too short


I know this question is almost five years old, but I'm going to post an answer anyway. When searching for how to make and apply patches for strings in Ruby, even now, I was unable to find any resources that answer this question satisfactorily. For that reason, I'll show how I solved this problem in my application.

Making Patches

I'm assuming you're using Linux, or else have access to the program diff through Cygwin. In that case, you can use the excellent Diffy gem to create ed script patches:

patch_text = Diffy::Diff.new(old_text, new_text, :diff => "-e").to_s

Applying Patches

Applying patches is not quite as straightforward. I opted to write my own algorithm, ask for improvements in Code Review, and finally settle on using the code below. This code is identical to 200_success's answer except for one change to improve its correctness.

require 'stringio'
def self.apply_patch(old_text, patch)
  text = old_text.split("\n")
  patch = StringIO.new(patch)
  current_line = 1

  while patch_line = patch.gets
    # Grab the command
    m = %r{\A(?:(\d+))?(?:,(\d+))?([acd]|s/\.//)\Z}.match(patch_line)
    raise ArgumentError.new("Invalid ed command: #{patch_line.chomp}") if m.nil?
    first_line = (m[1] || current_line).to_i
    last_line = (m[2] || first_line).to_i
    command = m[3]

    case command
    when "s/.//"
      (first_line..last_line).each { |i| text[i - 1].sub!(/./, '') }
    else
      if ['d', 'c'].include?(command)
        text[first_line - 1 .. last_line - 1] = []
      end
      if ['a', 'c'].include?(command)
        current_line = first_line - (command=='a' ? 0 : 1) # Adds are 0-indexed, but Changes and Deletes are 1-indexed
        while (patch_line = patch.gets) && (patch_line.chomp! != '.') && (patch_line != '.')
          text.insert(current_line, patch_line)
          current_line += 1
        end
      end
    end
  end
  text.join("\n")
end
like image 40
Kurt Tomlinson Avatar answered Sep 27 '22 21:09

Kurt Tomlinson