I feel embarrassed to ask this question as I feel like I should already know. However, given I don't....I want to know how to read large files from disk to a database without getting an OutOfMemory exception. Specifically, I need to load CSV (or really tab delimited files).
I am experimenting with CSVReader
and specifically this code sample but I'm sure I'm doing it wrong. Some of their other coding samples show how you can read streaming files of any size, which is pretty much what I want (only I need to read from disk), but I don't know what type of IDataReader
I could create to allow this.
I am reading directly from disk and my attempt to ensure I don't ever run out of memory by reading too much data at once is below. I can't help thinking that I should be able to use a BufferedFileReader
or something similar where I can point to the location of the file and specify a buffer size and then CsvDataReader
expects an IDataReader
as it's first parameter, it could just use that. Please show me the error of my ways, let me be rid of my GetData
method with it's arbitrary file chunking mechanism and help me out with this basic problem.
private void button3_Click(object sender, EventArgs e)
{
totalNumberOfLinesInFile = GetNumberOfRecordsInFile();
totalNumberOfLinesProcessed = 0;
while (totalNumberOfLinesProcessed < totalNumberOfLinesInFile)
{
TextReader tr = GetData();
using (CsvDataReader csvData = new CsvDataReader(tr, '\t'))
{
csvData.Settings.HasHeaders = false;
csvData.Settings.SkipEmptyRecords = true;
csvData.Settings.TrimWhitespace = true;
for (int i = 0; i < 30; i++) // known number of columns for testing purposes
{
csvData.Columns.Add("varchar");
}
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(@"Data Source=XPDEVVM\XPDEV;Initial Catalog=MyTest;Integrated Security=SSPI;"))
{
bulkCopy.DestinationTableName = "work.test";
for (int i = 0; i < 30; i++)
{
bulkCopy.ColumnMappings.Add(i, i); // map First to first_name
}
bulkCopy.WriteToServer(csvData);
}
}
}
}
private TextReader GetData()
{
StringBuilder result = new StringBuilder();
int totalDataLines = 0;
using (FileStream fs = new FileStream(pathToFile, FileMode.Open, System.IO.FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader sr = new StreamReader(fs))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("D\t"))
{
totalDataLines++;
if (totalDataLines < 100000) // Arbitrary method of restricting how much data is read at once.
{
result.AppendLine(line);
}
}
}
}
}
totalNumberOfLinesProcessed += totalDataLines;
return new StringReader(result.ToString());
}
Actually your code is reading all data from file and keep into TextReader
(in memory). Then you read data from TextReader
to Save server.
If data is so big, data size in TextReader
caused out of memory. Please try this way.
1) Read data (each line) from File.
2) Then insert each line to Server.
Out of memory problem will be solved because only each record in memory while processing.
Pseudo code
begin tran
While (data = FilerReader.ReadLine())
{
insert into Table[col0,col1,etc] values (data[0], data[1], etc)
}
end tran
Probably not the answer you're looking for but this is what BULK INSERT was designed for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With