Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter large list object on data from another large list: slow performance

Tags:

performance

c#

I have two large lists of object. First (about of 1 000 000 objects):

public class BaseItem
{
    public BaseItem()
    {

    }

    public double Fee { get; set; } = 0;

    public string Market { get; set; } = string.Empty;

    public string Traider { get; set; } = string.Empty;

    public DateTime DateUtc { get; set; } = new DateTime();
}

Second (about of 20 000 objects):

public class TraiderItem
{
    public TraiderItem()
    {

    }

    public DateTime DateUtc { get; set; } = new DateTime();

    public string Market { get; set; } = string.Empty;

    public string Type { get; set; } = string.Empty;

    public double Price { get; set; } = 0;

    public double Amount { get; set; } = 0;

    public double Total { get; set; } = 0;

    public double Fee { get; set; } = 0;

    public string FeeCoin { get; set; } = string.Empty;
}

I need to find all Traider items in Base items when DateUtc are equals and Fee are equals. Now i am using Any method:

traiderItemsInBase = traiderItems.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc && Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();

But this way is very-very slow. Is there a way to make this more efficient?Is there possibility to use HashSet in this case?

like image 643
Konstantin Avatar asked Nov 08 '18 09:11

Konstantin


2 Answers

First I though of a solution with Hashet<> or Dictionary<> but that doesn't really fit into this use case. How about speeding it up by using more of your cores / threads with PLINQ AsParallel()?

traiderItemsInBase = traiderItems.AsParallel()
    .Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc &&
                              Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8)))
    .ToList();

This should scale pretty good since these operations happen from your memory and not querying a database or another bottleneck. So 4 cores should solve this almost 4x faster.

like image 102
fubo Avatar answered Nov 03 '22 03:11

fubo


You could precalculate the rounded fee on both collections. Maybe group the items by date if they duplicate a lot in largest collection.

like image 37
AccessViolation Avatar answered Nov 03 '22 05:11

AccessViolation