Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use C++ to compare changes in XML [closed]

I have two large XML files which have the same schema but different entries. Each day the entries change and I want to be able to find:

  • Entry appears in file A but not file B
  • Entry appears in file B but not file A
  • Entry appears in both file A and B

I'm new to programming and I'm having a hard time understanding an efficient way to approach this. Is using (trillions of) loops the key to this?

Example shortened XML file:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[946757316]]></id>
<url><![CDATA[http://www.site.co.uk/cgi-bin/tr.cgi?tid=752276]]></url>
<content><![CDATA[Specialized Dolce Sport 27 Speed]]></content>
<title><![CDATA[Bike]]></title>
<price><![CDATA[£600]]></price>
<date><![CDATA[01-AUG-13]]></date>
<display_reference><![CDATA[214683-50142933_370647]]></display_reference>
<location><![CDATA[City of London]]></location>
<category><![CDATA[Bike]]></category>
</entry>
<entry>
<id><![CDATA[90007316]]></id>
<url><![CDATA[http://www.site.co.uk/cgi-bin/tr.cgi?tid=70952276]]></url>
<content><![CDATA[Giant Sport Offroad Bike]]></content>
<title><![CDATA[Bike]]></title>
<price><![CDATA[£100]]></price>
<date><![CDATA[11-AUG-15]]></date>
<display_reference><![CDATA[2146433-50142933_370647]]></display_reference>
<location><![CDATA[City of London]]></location>
<category><![CDATA[Bike]]></category>
</entry>
</site_entries>

Edit: I can't rely on the entires being in the right order across the files.

like image 333
Jimmy Avatar asked Mar 20 '26 09:03

Jimmy


1 Answers

Here is an example of how this can work using pugixml.

For the purposes of the test the XML files are stored in std::istringstream objects, that can be replaced by std::ifstream objects to read from files.

#include <set>
#include <string>
#include <sstream>
#include <iostream>
#include <algorithm>

#include "pugixml.hpp"

#define con(m) std::cout << m << '\n'
#define err(m) std::cerr << m << std::endl

std::istringstream iss_a(R"~(<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[1]]></id>
</entry>
<entry>
<id><![CDATA[2]]></id>
</entry>
</site_entries>)~");

std::istringstream iss_b(R"~(<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[2]]></id>
</entry>
<entry>
<id><![CDATA[3]]></id>
</entry>
</site_entries>)~");

using str_set = std::set<std::string>;

int main()
{
    pugi::xml_document doc;

    str_set a;
    doc.load(iss_a); // use doc.load_file() in real code

    // fill set a with just the ids from file a
    for(auto&& node: doc.child("site_entries").children("entry"))
        a.emplace(node.child("id").text().as_string());

    str_set b;
    doc.load(iss_b);

    // fill set b with just the ids from file b
    for(auto&& node: doc.child("site_entries").children("entry"))
        b.emplace(node.child("id").text().as_string());

    // now use the <algorithms> library

    str_set b_from_a;
    std::set_difference(a.begin(), a.end(), b.begin(), b.end()
        , std::inserter(b_from_a, b_from_a.begin()));

    str_set a_from_b;
    std::set_difference(b.begin(), b.end(), a.begin(), a.end()
        , std::inserter(a_from_b, a_from_b.begin()));

    str_set a_and_b;
    std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
        , std::inserter(a_and_b, a_and_b.begin()));

    for(auto&& v: a)
        con("a       : " << v);

    con("");

    for(auto&& v: b)
        con("b       : " << v);

    con("");

    for(auto&& v: b_from_a)
        con("b_from_a: " << v);

    con("");

    for(auto&& v: a_from_b)
        con("a_from_b: " << v);

    con("");

    for(auto&& v: a_and_b)
        con("a_and_b : " << v);

    con("");
}

Output:

a       : 1
a       : 2

b       : 2
b       : 3

b_from_a: 1

a_from_b: 3

a_and_b : 2

References:

std::set_difference

std::set_intersection

like image 195
Galik Avatar answered Mar 21 '26 23:03

Galik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!