Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing large text file in C#

I have 4GB+ text files (csv format) and I want to process this file using linq in c#.

I run complex linq query after load csv and convert to class?

but file size is 4gb although application memory double size of file.

how can i process (linq and new result) large files?

Thanks

like image 277
oguzh4n Avatar asked Jun 24 '11 07:06

oguzh4n


3 Answers

Instead of loading whole file into memory, you could read and process the file line-by-line.

using (var streamReader = new StreamReader(fileName))
{
    string line;
    while ((line = streamReader.ReadLine()) != null)
    {
        // analize line here
        // throw it away if it does not match
    }
}

[EDIT]

If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.

like image 173
Alex Aza Avatar answered Oct 08 '22 10:10

Alex Aza


I think this one is good way... CSV

like image 24
Gans Avatar answered Oct 08 '22 09:10

Gans


If you are using .NET 4.0 you could use Clay and then write a method that returns an IEnumerable line for line and that makes code like the below possible

from record in GetRecords("myFile.csv",new []{"Foo","Bar"},new[]{","})
where record.Foo == "Baz"
select new {MyRealBar = int.Parse(record.Bar)

the method to project the CSV into a sequence of Clay objects could be created like:

 private IEnumerable<dynamic> GetRecords(
                    string filePath,
                    IEnumerable<string> columnNames, 
                    string[] delimiter){
            if (!File.Exists(filePath))
                yield break;
            var columns = columnNames.ToArray();
            dynamic New = new ClayFactory();
            using (var streamReader = new StreamReader(filePath)){
                var columnLength = columns.Length;
                string line;
                while ((line = streamReader.ReadLine()) != null){
                    var record = New.Record();
                    var fields = line.Split(delimiter, StringSplitOptions.None);
                    if(fields.Length != columnLength)
                        throw new InvalidOperationException(
                                 "fields count does not match column count");
                    for(int i = 0;i<columnLength;i++){
                        record[columns[i]] = fields[i];
                    }
                    yield return record;
                }
            }
        }
like image 23
Rune FS Avatar answered Oct 08 '22 09:10

Rune FS