Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count how many equal strings are in XML file

Tags:

c#

xml

I'm wondering if there is a method to check how many equal strings are in an XML file. For example this is the XML file:

<Root>
  <task>
    <sub1>test</sub1>
    <sub2>hello</sub2>
    <sub3>csharp</sub3>
  </task>
  <task>
    <sub1>test2</sub1>
    <sub2>hello2</sub2>
    <sub3>csharp2</sub3>
  </task>
  <task>
    <sub1>test3</sub1>
    <sub2>hello3</sub2>
    <sub3>csharp3</sub3>
  </task>
  <task>
    <sub1>test</sub1>
    <sub2>hello4</sub2>
    <sub3>csharp4</sub3>
  </task>
</Root>

As you can see node.Innertext = "test" exists twice. I'm wondering how I can count that? I tried something like

client["sub1"].InnerText.Count

but this counts the number of character in this string.

Suggestions appreciated :)

EDIT: I parse the XML file using XmlDocument

like image 709
roemel Avatar asked Jan 11 '23 15:01

roemel


2 Answers

Select elements you want to check (e.g. all sub elements of all tasks) and group them by value:

xdoc.Root.Elements("task").SelectMany(t => t.Elements())
    .GroupBy(e => e.Value)
    .Select(g => new { Text = g.Key, Count = g.Count() })

Query syntax:

var xdoc = XDocument.Load(path_to_xml);
var result = from t in xdoc.Root.Elements("task")
             from e in t.Elements()
             group e by e.Value into g
             select new {
                  Text = g.Key,
                  Count = g.Count()
             };

With XPath:

var result = from e in xdoc.XPathSelectElements("//task/*")
             group e by e.Value into g
             select new {
                 Text = g.Key,
                 Count = g.Count()
             };

For your sample xml result will be:

[
  { Text: "test", Count: 2 },
  { Text: "hello", Count: 1 },
  { Text: "csharp", Count: 1 },
  { Text: "test2", Count: 1 },
  { Text: "hello2", Count: 1 },
  { Text: "csharp2", Count: 1 },
  { Text: "test3", Count: 1 },
  { Text: "hello3", Count: 1 },
  { Text: "csharp3", Count: 1 },
  { Text: "hello4", Count: 1 },
  { Text: "csharp4", Count: 1 }
]

You can filter results by count if you want to get only text which exist more than once:

 result.Where(x => x.Count > 1)

Same query for XmlDocument:

var doc = new XmlDocument();
doc.Load(path_to_xml);
var result = from XmlNode n in doc.SelectNodes("//task/*")
             group n by n.InnerText into g
             select new {
                 Text = g.Key,
                 Count = g.Count()
             };
like image 142
Sergey Berezovskiy Avatar answered Jan 16 '23 18:01

Sergey Berezovskiy


var dubs = XDocument.Parse(xml)
            .Descendants("task")
            .GroupBy(g => (string)g.Attribute("sub1"))
            .Where(g => g.Count() > 1)
            .Select(g => g.Key);
like image 32
Sajeetharan Avatar answered Jan 16 '23 18:01

Sajeetharan