Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL based data diff: longest common subsequence

I'm looking for research papers or writings in applying Longest Common Subsquence algorithm to SQL tables for obtaining a data diff view. Other sugestions on how to resolve a table diff problem are also welcomed. The challenge being that SQL tables have this nasty habit of geting rather BIG and applying straightforward algorithms designed for text processing may result in a program that never ends...

so given a table Original:

Key  Content
1    This row is unchanged
2    This row is outdated
3    This row is wrong
4    This row is fine as it is

and the table New:

Key Content
1   This row was added
2   This row is unchanged
3   This row is right
4   This row is fine as it is
5   This row contains important additions

I need to find out the Diff:

+++ 1 This row was added
--- 2 This row is outdated
--- 3 This row is wrong
+++ 3 This row is right
+++ 5 This row contains important additions
like image 468
Remus Rusanu Avatar asked Jul 17 '10 00:07

Remus Rusanu


People also ask

Which method can be used to solve the longest common subsequence problem?

1. Which of the following methods can be used to solve the longest common subsequence problem? Explanation: Both recursion and dynamic programming can be used to solve the longest subsequence problem.

What is LCS data structure?

The longest common subsequence (LCS) is defined as the longest subsequence that is common to all the given sequences, provided that the elements of the subsequence are not required to occupy consecutive positions within the original sequences.

Which algorithm is used for longest common subsequence?

In this example, we have two strings X = BACDB and Y = BDCB to find the longest common subsequence. Following the algorithm LCS-Length-Table-Formulation (as stated above), we have calculated table C (shown on the left hand side) and table B (shown on the right hand side).

How do you calculate LCS?

1. Let's consider two sequences, X and Y , of length m and n that both end in the same element. To find their LCS, shorten each sequence by removing the last element, find the LCS of the shortened sequences, and that LCS append the removed element.


1 Answers

If you export your tabls into csv files, you can use http://sourceforge.net/projects/csvdiff/

Quote: csvdiff is a Perl script to diff/compare two csv files with the possibility to select the separator. Differences will be shown like: "Column XYZ in record 999" is different. After this, the actual and the expected result for this column will be shown.

like image 199
max muster Avatar answered Oct 25 '22 08:10

max muster