Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Elements Normalizing

Tags:

c#

xml

I have some XMLs that represent permutation between for example, members of 4 sets (A,B,C,D). Suppose that A={A1,A2}, B={B1}, C={C1,C2} and D={D1,D2,D3} but current XML is not normal because this members combined in non-regular way in each answer. "set" Attribute shows name of set and "member" shows each member of each set. This XML likes below:

<root>
    <phrase permutation=ABCD>
       <ans number=1>
           <word set=A member=A1/>
           <word set=A member=A2/>
           <word set=B member=B1/>
           <word set=C member=C1/>
           <word set=D member=D2/>
       </ans>
       <ans number=2>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C1/>
           <word set=C member=C2/>
           <word set=C member=C3/>
           <word set=D member=D1/>
           <word set=D member=D3/>
       </ans>
    </phrase>
</root>

and I wanna to put each permutation in a specific answer. Each answer should be start with only one member of A and End with one member of D and use only one member of sets B and C between them. for example answer A1A2B1C1D2 should be separate to A1B1C1D2, A2B1C1D2 and answer A1B1C1C2C3D1D3 should be separate to A1B1C1D1, A1B1C1D3, A1B1C2D1, A1B1C2D3, A1B1C3D1 and A1B1C3D3 final XML likes such as below XML:

<root>
    <phrase permutation=ABCD>
       <ans number=1>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C1/>
           <word set=D member=D2/>
       </ans>
       <ans number=2>
           <word set=A member=A2/>
           <word set=B member=B1/>
           <word set=C member=C1/>
           <word set=D member=D2/>
       </ans>
       <ans number=3>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C1/>
           <word set=D member=D1/>
           </ans>
       <ans number=4>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C1/>
           <word set=D member=D3/>
           </ans>
       <ans number=5>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C2/>
           <word set=D member=D1/>
       </ans>
       <ans number=6>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C2/>
           <word set=D member=D3/>
       </ans>
       <ans number=7>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C3/>
           <word set=D member=D1/>
      </ans>
      <ans number=8>
           <word set=A member=A1/>
           <word set=B member=B1/>
           <word set=C member=C3/>
           <word set=D member=D3/>
       </ans>
    </phrase>
</root>

I hope that my question be clear and you can help me. Thanks

like image 475
SMD Avatar asked Dec 18 '13 00:12

SMD


People also ask

Why we use normalize?

The main use of normalization is to utilize in order to remove anomalies that are caused because of the transitive dependency. Normalization is to minimize the redundancy and remove Insert, Update and Delete Anomaly. It divides larger tables into smaller tables and links them using relationships.

What is XML and node in XML?

Everything in an XML document is a node. For example, the entire document is the document node, and every element is an element node. Root node. The topmost node of a tree. In the case of XML documents, it is always the document node, and not the top-most element.


1 Answers

Ok, first of all: please note that in your XML attributes are unquoted, so .NET's standard XML processing will fail to read those out of the box - I've simply corrected those to write the solution below.

var original = XDocument.Parse(/* your XML as string */);

var normalized = new XDocument(original);

foreach (var phraseNode in normalized.Root.Elements("phrase"))
{
   phraseNode.Elements().Remove();
   int ansNo = 1;

   foreach(var answer in original.Root
                                 .Elements("phrase")
                                 .Single(p => p.Attribute("permutation").Value
                                            == phraseNode.Attribute("permutation").Value)
                                 .Elements("ans"))
   {
      var groupedWords = answer.Elements("word")
                               .GroupBy(w => w.Attribute("set").Value)
                               .ToArray();
      var newAnswers = groupedWords.Skip(1)
                                   .Aggregate(
                                     groupedWords[0].Select(w => Enumerable.Repeat(w, 1)),
                                     (combinations, newWords) =>
                                         combinations.Join(newWords,
                                                           c => 1,
                                                           w => 1,
                                                           (c, w) => c.Concat(new[] { w })));
      foreach (var newAnswer in newAnswers)
      {
         var ansNode = new XElement("ans", new XAttribute("number", ansNo++));
         ansNode.Add(newAnswer.Select(w => new XElement(w)).ToArray());
         phraseNode.Add(ansNode);
      }
   }
}

If you don't know LINQ to XML this might be a bit intimidating at first; hopefully with some light reading or prior knowledge the only more complex (relatively speaking of course!) bit might be the actual code generating the permutations (the part where newAnswers var is initialized) - you can either take this at face value or try to read a bit more on how LINQ joins work.

Also - please note that this wasn't written with any heavy-duty optimizations in mind; in 99,99% cases this shouldn't be an issue hopefully.

like image 67
decPL Avatar answered Oct 13 '22 01:10

decPL