I have 4GB+ text files (csv format) and I want to process this file using linq in c#.
I run complex linq query after load csv and convert to class?
but file size is 4gb although application memory double size of file.
how can i process (linq and new result) large files?
Thanks
Instead of loading whole file into memory, you could read and process the file line-by-line.
using (var streamReader = new StreamReader(fileName))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
// analize line here
// throw it away if it does not match
}
}
[EDIT]
If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.
I think this one is good way... CSV
If you are using .NET 4.0 you could use Clay and then write a method that returns an IEnumerable line for line and that makes code like the below possible
from record in GetRecords("myFile.csv",new []{"Foo","Bar"},new[]{","})
where record.Foo == "Baz"
select new {MyRealBar = int.Parse(record.Bar)
the method to project the CSV into a sequence of Clay objects could be created like:
private IEnumerable<dynamic> GetRecords(
string filePath,
IEnumerable<string> columnNames,
string[] delimiter){
if (!File.Exists(filePath))
yield break;
var columns = columnNames.ToArray();
dynamic New = new ClayFactory();
using (var streamReader = new StreamReader(filePath)){
var columnLength = columns.Length;
string line;
while ((line = streamReader.ReadLine()) != null){
var record = New.Record();
var fields = line.Split(delimiter, StringSplitOptions.None);
if(fields.Length != columnLength)
throw new InvalidOperationException(
"fields count does not match column count");
for(int i = 0;i<columnLength;i++){
record[columns[i]] = fields[i];
}
yield return record;
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With