Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse XML with LINQ to get child elements

<?xml version="1.0" standalone="yes"?>
<CompanyInfo>
     <Employee name="Jon" deptId="123">
      <Region name="West">
        <Area code="96" />
      </Region>
      <Region name="East">
        <Area code="88" />
      </Region>
     </Employee>
</CompanyInfo>  

public class Employee
{
    public string EmployeeName { get; set; }
    public string DeptId { get; set; }
    public List<string> RegionList {get; set;}
}

public class Region
{
    public string RegionName { get; set; }
    public string AreaCode { get; set; }
}

I am trying to read this XML data, so far I have tried this:

XDocument xml = XDocument.Load(@"C:\data.xml");
var xElement = xml.Element("CompanyInfo");
if (xElement != null)
    foreach (var child in xElement.Elements())
    {
        Console.WriteLine(child.Name);  
        foreach (var item in child.Attributes())
        {
            Console.WriteLine(item.Name + ": " + item.Value);
        }

        foreach (var childElement in child.Elements())
        {
            Console.WriteLine("--->" + childElement.Name);
            foreach (var ds in childElement.Attributes())
            {
                Console.WriteLine(ds.Name + ": " + ds.Value);
            }
            foreach (var element in childElement.Elements())
            {
                Console.WriteLine("------->" + element.Name);
                foreach (var ds in element.Attributes())
                {
                    Console.WriteLine(ds.Name + ": " + ds.Value);
                }
            }
        }                
    }

This enables me to get each node, its attribute name and value and so I can save these data into the relevant field in database, but this seems a long winded way and not flexible, for instance if the XML structure changes all those foreach statements needs revisiting, also it is difficult to filter the data this way, I need to write certain if statements to filter the data (e.g get employees from West only etc...)

I was looking for a more flexible way, using linq, something like this:

List<Employees> employees =
              (from employee in xml.Descendants("CompanyInfo")
               select new employee
               {
                   EmployeeName = employee.Element("employee").Value,
                   EmployeeDeptId = ?? get data,
                   RegionName = ?? get data,
                   AreaCode = ?? get data,,
               }).ToList<Employee>();

But I am not sure how I can get the values from the child nodes and apply the filtering (to get the certain employees only). Is this possible? Any help is appreciated.

Thanks

like image 667
03Usr Avatar asked Sep 19 '13 09:09

03Usr


2 Answers

var employees = (from e in xml.Root.Elements("Employee")
                 let r = e.Element("Region")
                 where (string)r.Attribute("name") == "West"
                 select new Employee
                 {
                     EmployeeName = (string)e.Attribute("employee"),
                     EmployeeDeptId = (string)e.Attribute("deptId"),
                     RegionName = (string)r.Attribute("name"),
                     AreaCode = (string)r.Element("Area").Attribute("code"),
                 }).ToList();

But it will still require query revision when XML file structure changes.

Edit

Query for multiple regions per employee:

var employees = (from e in xml.Root.Elements("Employee")
                 select new Employee
                 {
                     EmployeeName = (string)e.Attribute("employee"),
                     DeptId = (string)e.Attribute("deptId"),
                     RegionList = e.Elements("Region")
                                   .Select(r => new Region {
                                       RegionName = (string)r.Attribute("name"),
                                       AreaCode = (string)r.Element("Area").Attribute("code")
                                   }).ToList()
                 }).ToList();

You can then filter the list for employees from given region only:

var westEmployees = employees.Where(x => x.RegionList.Any(r => r.RegionName == "West")).ToList();
like image 82
MarcinJuraszek Avatar answered Nov 07 '22 09:11

MarcinJuraszek


You can track the structure:

from employee in xml
      .Element("CompanyInfo")       // must be root
      .Elements("Employee")         // only directly children of CompanyInfo

or less strictly

from employee in xml.Descendants("Employee")    // all employees at any level

And then get the information you want:

       select new Employee
       {
           EmployeeName = employee.Attribute("name").Value,
           EmployeeDeptId = employee.Attribute("deptId").Value,
           RegionName = employee.Element("Region").Attribute("name").Value,
           AreaCode = employee.Element("Region").Element("Area").Attribute("code").Value,
       }

And with the additional info about multiple regions, assuming a List<Region> Regions property:

       select new Employee
       {
           EmployeeName = employee.Attribute("name").Value,
           EmployeeDeptId = employee.Attribute("deptId").Value,
           //RegionName = employee.Element("Region").Attribute("name").Value,
           //AreaCode = employee.Element("Region").Element("Area").Attribute("code").Value,
           Regions = (from r in employee.Elements("Region") select new Region 
                      {
                         Name = r.Attribute("name").Value,
                         Code = r.Element("Area").Attribute("code").Value,
                      }).ToList();
       }
like image 27
Henk Holterman Avatar answered Nov 07 '22 09:11

Henk Holterman