Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update data only by difference between files (delta for java)

Tags:

java

delta

UPDATE: I solved the problem with a great external library - https://code.google.com/p/xdeltaencoder/. The way I did it is posted below as the accepted answer

Imagine I have two separate pcs who both have an identical byte[] A.

One of the pcs creates byte[] B, which is almost identical to byte[] A but is a 'newer' version.

For the second pc to update his copy of byte[] A into the latest version (byte[] B), I need to transmit the whole byte[] B to the second pc. If byte[] B is many GB's in size, this will take too long.

Is it possible to create a byte[] C that is the 'difference between' byte[] A and byte[] B? The requirements for byte[] C is that knowing byte[] A, it is possible to create byte[] B.

That way, I will only need to transmit byte[] C to the second PC, which in theory would be only a fraction of the size of byte[] B.

I am looking for a solution to this problem in Java.

Thankyou very much for any help you can provide :)

EDIT: The nature of the updates to the data in most circumstances is extra bytes being inserted into parts of the array. Ofcourse it is possible that some bytes will be changed or some bytes deleted. the byte[] itself represents a tree of the names of all the files/folders on a target pc. the byte[] is originally created by creating a tree of custom objects, marshalling them with JSON, and then compressing that data with a zip algorithm. I am struggling to create an algorithm that can intelligently create object c.

EDIT 2: Thankyou so much for all the help everyone here has given, and I am sorry for not being active for such a long time. I'm most probably going to try to get an external library to do the delta-encoding for me. A great part about this thread is that I now know what I want to achieve is called! I believe that when I find an appropriate solution I will post it and accept it so others can see as to how I solved my problem. Once again, thankyou very much for all your help.

like image 669
SuperMrBlob Avatar asked Jan 24 '14 11:01

SuperMrBlob


2 Answers

Using a collection of "change events" rather than sending the whole array

A solution to this would be to send a serialized object describing the change rather than the actual array all over again.

public class ChangePair implements Serializable{
    //glorified struct
    public final int index;
    public final  byte newValue;

    public ChangePair(int index, byte newValue) {
        this.index = index;
        this.newValue = newValue;
    }

    public static void main(String[] args){

        Collection<ChangePair> changes=new HashSet<ChangePair>();

        changes.add(new ChangePair(12,(byte)2));
        changes.add(new ChangePair(1206,(byte)3));

    }
}

Generating the "change events"

The most efficient method for achieving this would be to track changes as you go, but assuming thats not possible you can just brute force your way through, finding which values are different

public static Collection<ChangePair> generateChangeCollection(byte[] oldValues, byte[] newValues){
    //validation
    if (oldValues.length!=newValues.length){
        throw new RuntimeException("new and old arrays are differing lengths");
    }

    Collection<ChangePair> changes=new HashSet<ChangePair>();

    for(int i=0;i<oldValues.length;i++){
        if (oldValues[i]!=newValues[i]){
            //generate a change event
            changes.add(new ChangePair(i,newValues[i]));
        }
    }

    return changes;
}

Sending and recieving those change events

As per this answer regarding sending serialized objects over the internet you could then send your object using the following code

Collection<ChangePair> changes=generateChangeCollection(oldValues,newValues);

Socket s = new Socket("yourhostname", 1234);
ObjectOutputStream out = new ObjectOutputStream(s.getOutputStream());
out.writeObject(objectToSend);
out.flush();

On the other end you would recieve the object

ServerSocket server = new ServerSocket(1234);
Socket s = server.accept();
ObjectInputStream in = new ObjectInputStream(s.getInputStream());
Collection<ChangePair> objectReceived = (Collection<ChangePair>) in.readObject();
//use Collection<ChangePair> to apply changes

Using those change events

This collection can then simply be used to modify the array of bytes on the other end

public static void useChangeCollection(byte[] oldValues, Collection<ChangePair> changeEvents){
    for(ChangePair changePair:changeEvents){
        oldValues[changePair.index]=changePair.newValue;
    }
}   
like image 197
Richard Tingle Avatar answered Nov 04 '22 21:11

Richard Tingle


Locally log the changes to the byte array, like a little version control system. In fact you could use a VCS to create patch files, send them to the other side and apply them to get an up-to-date file;

If you cannot log changes, you would need to double the array locally, or (not so 100% safe) use an array of checksums on blocks.

like image 38
Joop Eggen Avatar answered Nov 04 '22 20:11

Joop Eggen