Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to Pivot data using LINQ?

People also ask

How do I pivot a DataTable in C#?

C# DataTable to Pivot DataTable:DataTable myDataTable = new DataTable(); myDataTable. Columns. AddRange(new DataColumn[3] { new DataColumn("Product"), new DataColumn("Year", typeof(int)), new DataColumn("Sales", typeof(int)) }); myDataTable. Rows.

What can LINQ be used for?

LINQ in C# is used to work with data access from sources such as objects, data sets, SQL Server, and XML. LINQ stands for Language Integrated Query. LINQ is a data querying API with SQL like query syntaxes. LINQ provides functions to query cached data from all kinds of data sources.

What kind of data can be queried with LINQ?

LINQ offers common syntax for querying any type of data source; for example, you can query an XML document in the same way as you query a SQL database, an ADO.NET dataset, an in-memory collection, or any other remote or local data source that you have chosen to connect to and access by using LINQ.


Something like this?

List<CustData> myList = GetCustData();

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => new {
        CustId = g.Key,
        Jan = g.Where(c => c.OrderDate.Month == 1).Sum(c => c.Qty),
        Feb = g.Where(c => c.OrderDate.Month == 2).Sum(c => c.Qty),
        March = g.Where(c => c.OrderDate.Month == 3).Sum(c => c.Qty)
    });

GroupBy in Linq does not work the same as SQL. In SQL, you get the key and aggregates (row/column shape). In Linq, you get the key and any elements as children of the key (hierarchical shape). To pivot, you must project the hierarchy back into a row/column form of your choosing.


I answered similar question using linq extension method:

// order s(ource) by OrderDate to have proper column ordering
var r = s.Pivot3(e => e.custID, e => e.OrderDate.ToString("MMM-yyyy")
    , lst => lst.Sum(e => e.Qty));
// order r(esult) by CustID

(+) generic implementation
(-) definitely slower than Amy B's

Can anyone improve my implementation (i.e. the method does the ordering of columns & rows)?


The neatest approach for this, I think, is to use a lookup:

var query =
    from c in myList
    group c by c.CustId into gcs
    let lookup = gcs.ToLookup(y => y.OrderDate.Month, y => y.Qty)
    select new
    {
        CustId = gcs.Key,
        Jan = lookup[1].Sum(),
        Feb = lookup[2].Sum(),
        Mar = lookup[3].Sum(),
    };

Here is a bit more generic way how to pivot data using LINQ:

IEnumerable<CustData> s;
var groupedData = s.ToLookup( 
        k => new ValueKey(
            k.CustID, // 1st dimension
            String.Format("{0}-{1}", k.OrderDate.Month, k.OrderDate.Year // 2nd dimension
        ) ) );
var rowKeys = groupedData.Select(g => (int)g.Key.DimKeys[0]).Distinct().OrderBy(k=>k);
var columnKeys = groupedData.Select(g => (string)g.Key.DimKeys[1]).Distinct().OrderBy(k=>k);
foreach (var row in rowKeys) {
    Console.Write("CustID {0}: ", row);
    foreach (var column in columnKeys) {
        Console.Write("{0:####} ", groupedData[new ValueKey(row,column)].Sum(r=>r.Qty) );
    }
    Console.WriteLine();
}

where ValueKey is a special class that represents multidimensional key:

public sealed class ValueKey {
    public readonly object[] DimKeys;
    public ValueKey(params object[] dimKeys) {
        DimKeys = dimKeys;
    }
    public override int GetHashCode() {
        if (DimKeys==null) return 0;
        int hashCode = DimKeys.Length;
        for (int i = 0; i < DimKeys.Length; i++) { 
            hashCode ^= DimKeys[i].GetHashCode();
        }
        return hashCode;
    }
    public override bool Equals(object obj) {
        if ( obj==null || !(obj is ValueKey))
            return false;
        var x = DimKeys;
        var y = ((ValueKey)obj).DimKeys;
        if (ReferenceEquals(x,y))
            return true;
        if (x.Length!=y.Length)
            return false;
        for (int i = 0; i < x.Length; i++) {
            if (!x[i].Equals(y[i]))
                return false;
        }
        return true;            
    }
}

This approach can be used for grouping by N-dimensions (n>2) and will work fine for rather small datasets. For large datasets (up to 1 mln of records and more) or for cases when pivot configuration cannot be hardcoded I've written special PivotData library (it is free):

var pvtData = new PivotData(new []{"CustID","OrderDate"}, new SumAggregatorFactory("Qty"));
pvtData.ProcessData(s, (o, f) => {
    var custData = (TT)o;
    switch (f) {
        case "CustID": return custData.CustID;
        case "OrderDate": 
        return String.Format("{0}-{1}", custData.OrderDate.Month, custData.OrderDate.Year);
        case "Qty": return custData.Qty;
    }
    return null;
} );
Console.WriteLine( pvtData[1, "1-2008"].Value );