I have two large XML files which have the same schema but different entries. Each day the entries change and I want to be able to find:
I'm new to programming and I'm having a hard time understanding an efficient way to approach this. Is using (trillions of) loops the key to this?
Example shortened XML file:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[946757316]]></id>
<url><![CDATA[http://www.site.co.uk/cgi-bin/tr.cgi?tid=752276]]></url>
<content><![CDATA[Specialized Dolce Sport 27 Speed]]></content>
<title><![CDATA[Bike]]></title>
<price><![CDATA[£600]]></price>
<date><![CDATA[01-AUG-13]]></date>
<display_reference><![CDATA[214683-50142933_370647]]></display_reference>
<location><![CDATA[City of London]]></location>
<category><![CDATA[Bike]]></category>
</entry>
<entry>
<id><![CDATA[90007316]]></id>
<url><![CDATA[http://www.site.co.uk/cgi-bin/tr.cgi?tid=70952276]]></url>
<content><![CDATA[Giant Sport Offroad Bike]]></content>
<title><![CDATA[Bike]]></title>
<price><![CDATA[£100]]></price>
<date><![CDATA[11-AUG-15]]></date>
<display_reference><![CDATA[2146433-50142933_370647]]></display_reference>
<location><![CDATA[City of London]]></location>
<category><![CDATA[Bike]]></category>
</entry>
</site_entries>
Edit: I can't rely on the entires being in the right order across the files.
Here is an example of how this can work using pugixml.
For the purposes of the test the XML files are stored in std::istringstream objects, that can be replaced by std::ifstream objects to read from files.
#include <set>
#include <string>
#include <sstream>
#include <iostream>
#include <algorithm>
#include "pugixml.hpp"
#define con(m) std::cout << m << '\n'
#define err(m) std::cerr << m << std::endl
std::istringstream iss_a(R"~(<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[1]]></id>
</entry>
<entry>
<id><![CDATA[2]]></id>
</entry>
</site_entries>)~");
std::istringstream iss_b(R"~(<?xml version="1.0" encoding="ISO-8859-1" ?>
<site_entries>
<entry>
<id><![CDATA[2]]></id>
</entry>
<entry>
<id><![CDATA[3]]></id>
</entry>
</site_entries>)~");
using str_set = std::set<std::string>;
int main()
{
pugi::xml_document doc;
str_set a;
doc.load(iss_a); // use doc.load_file() in real code
// fill set a with just the ids from file a
for(auto&& node: doc.child("site_entries").children("entry"))
a.emplace(node.child("id").text().as_string());
str_set b;
doc.load(iss_b);
// fill set b with just the ids from file b
for(auto&& node: doc.child("site_entries").children("entry"))
b.emplace(node.child("id").text().as_string());
// now use the <algorithms> library
str_set b_from_a;
std::set_difference(a.begin(), a.end(), b.begin(), b.end()
, std::inserter(b_from_a, b_from_a.begin()));
str_set a_from_b;
std::set_difference(b.begin(), b.end(), a.begin(), a.end()
, std::inserter(a_from_b, a_from_b.begin()));
str_set a_and_b;
std::set_intersection(a.begin(), a.end(), b.begin(), b.end()
, std::inserter(a_and_b, a_and_b.begin()));
for(auto&& v: a)
con("a : " << v);
con("");
for(auto&& v: b)
con("b : " << v);
con("");
for(auto&& v: b_from_a)
con("b_from_a: " << v);
con("");
for(auto&& v: a_from_b)
con("a_from_b: " << v);
con("");
for(auto&& v: a_and_b)
con("a_and_b : " << v);
con("");
}
Output:
a : 1
a : 2
b : 2
b : 3
b_from_a: 1
a_from_b: 3
a_and_b : 2
References:
std::set_difference
std::set_intersection
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With