Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is possible to change search method in LINQ?

Tags:

c#

csv

linq

I have csv file with 30 000 lines. I have to select many values based on many conditions, so insted of many loops and "if's" i decided to use linq. I have written class to read csv. It implements IEnumerable to be used with linq. This is my enumerator:

class CSVEnumerator : IEnumerator
{

    private CSVReader _csv;

    private int _index;

    public CSVEnumerator(CSVReader csv)
    {
        _csv = csv;
        _index = -1;
    }

    public void Reset(){_index = -1;}


    public object Current
    {
        get
        {
            return new CSVRow(_index,_csv);
        }
    }


    public bool MoveNext()
    {
        return ++_index < _csv.TotalRows;
    }

}

It's working, but it's slow. Let's say i want to select max value in column A in range 100;150 row.

max  = (from CSVRow r in csv where r.ID > 100 && r.ID < 150 select r).Max(y=>y["A"]);

This will work, but linq searches for max value in 30 000 rows instead of 48. As I said, I could use loop, but only in this example case, conditions are "brutal" :)

Is there any way to override linq collection search. Something like: look into query used on my enumerator, look, if any linq conditions in "where" contains "row ID filter" and give another data based on this.

I don't want to copy part of data to another array/collection and problem is not in my csv reader. Accessing every row by id is fast, only problem is when you access all 30 000 of them. Any help appriciated :-)

like image 744
Kryštof Hilar Avatar asked Dec 31 '12 17:12

Kryštof Hilar


2 Answers

If you wanted to be able to use LINQ for this efficiently, you would need to use expression trees, in a similar (but much simpler) way than what various LINQ providers for SQL databases do. While doable, I think it would be quite a lot of code for such a simple task.

Because of that, I think a better solution would be to use a separate method to select the rows you want (and then possibly use LINQ to work with the result).

Also, many operations that return collections (including your original code and my modification) can be simplified by using iterator methods.

So, your code could look something like this:

public static IEnumerable<CSVRow> GetRows(
    this CSVReader reader, int idGreaterThan, int idLessThan)
{
    for (int i = idGreaterThan + 1; i < idLessThan; i++)
    {
        yield return new CSVRow(reader, i);
    }
}

Here, it's an extension method for CSVReader, but another solution (e.g. actual method on that class) might be more appropriate for you.

Your example would then look something like:

max = csvReader.GetRows(100, 150).Max(y => y["A"]);

(Also, I find it weird that when you have limits 100 and 150, you actually want rows between 101 and 149. But I'm assuming you have a reason for that, so I did the same.)

like image 64
svick Avatar answered Nov 14 '22 07:11

svick


As far as LINQ is concerned, r.ID is simply a value that is being filtered and so all 30k lines are considered for use in the Max operation. If this is a row index, which seems to be the case here, you can use Skip and Take to avoid comparing all 30k rows.

max = csv.Skip(100).Take(50).Max(y => y["A"]);
like image 42
Doug Mitchell Avatar answered Nov 14 '22 09:11

Doug Mitchell