Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two vectors of maps

Tags:

c++

diff

I've got two ways of fetching a bunch of data. The data is stored in a sorted vector<map<string, int> >.

I want to identify whether there are inconsistencies between the two vectors.

What I'm currently doing (pseudo-code):

for i in 0... min(length(vector1), length(vector2)):
    for (k, v) in vector1[i]:
        if v != vector2[i][k]:
            // report that k is bad for index i,
            // with vector1 having v, vector2 having vector2[i][k]

for i in 0... min(length(vector1), length(vector2)):
    for (k, v) in vector2[i]:
        if v != vector1[i][k]:
            // report that k is bad for index i,
            // with vector2 having v, vector1 having vector1[i][k]

This works in general, but breaks horribly if vector1 has a, b, c, d and vector2 has a, b, b1, c, d (it reports brokenness for b1, c, and d). I'm after an algorithm that tells me that there's an extra entry in vector2 compared to vector1.

I think I want to do something where when I encountered mismatches entries, I look at the next entries in the second vector, and if a match is found before the end of the second vector, store the index i of the entry found in the second vector, and move to matching the next entry in the first vector, beginning with vector2[i+1].

Is there a neater way of doing this? Some standard algorithm that I've not come across?

I'm working in C++, so C++ solutions are welcome, but solutions in any language or pseudo-code would also be great.

Example

Given the arbitrary map objects: a, b, c, d, e, f and g;

With vector1: a, b, d, e, f

and vector2: a, c, e, f

I want an algorithm that tells me either:

Extra b at index 1 of vector1, and vector2's c != vector1's d.

or (I'd view this as an effectively equivalent outcome)

vector1's b != vector2's c and extra d at index 2 of vector1

Edit

I ended up using std::set_difference, and then doing some matching on the diffs from both sets to work out which entries were similar but different, and which had entries completely absent from the other vector.

like image 578
Dominic Rodger Avatar asked Aug 12 '09 11:08

Dominic Rodger


2 Answers

Something like the std::mismatch algorithm

You could also use std::set_difference

like image 130
Glen Avatar answered Oct 29 '22 15:10

Glen


It sounds like you're looking for the diff algorithm. The idea is to identify the longest common subsequence of the two vectors (using map equality), then recurse down the non-common portions. Eventually you'll have an alternating list of vector sub-sequences that are identical, and sub-sequences that have no common elements. You can then easily produce whatever output you like from this.

Apply it to the two vectors, and there you go.

Note that since map comparison is expensive, if you can hash the maps (use a strong hash - collisions will result in incorrect output) and use the hashes for comparisons you'll save a lot of time.

Once you're down to the mismatched subsequences at the end, you'll have something like:

Input vectors: a b c d e f, a b c' d e f
Output:
   COMMON a b
   LEFT c
   RIGHT c'
   COMMON d e f

You can then individually compare the maps c and c' to figure out how they differ.

If you have a mutation and insertion next to each other, it gets more complex:

Input vectors: a b V W d e f, a b X Y d e f
Output:
   COMMON a b
   LEFT V W
   RIGHT X Y
   COMMON d e f

Determining whether to match V and W against X or Y (or not at all) is something you'll have to come up with a heuristic for.

Of course, if you don't care about how the content of the maps differ, then you can stop here, and you have the output you want.

like image 44
bdonlan Avatar answered Oct 29 '22 15:10

bdonlan