Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to read large tab delimited txt file?

Tags:

c#

file-io

csv

I have a tab delimited txt file with 500K records. I'm using the code below to read data to dataset. With 50K it works fine but 500K it gives "Exception of type 'System.OutOfMemoryException' was thrown."

What is the more efficient way to read large tab delimited data? Or how to resolve this issue? Please give me an example

public DataSet DataToDataSet(string fullpath, string file)
{
    string sql = "SELECT * FROM " + file; // Read all the data
    OleDbConnection connection = new OleDbConnection // Connection
                  ("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fullpath + ";"
                   + "Extended Properties=\"text;HDR=YES;FMT=Delimited\"");
    OleDbDataAdapter ole = new OleDbDataAdapter(sql, connection); // Load the data into the adapter
    DataSet dataset = new DataSet(); // To hold the data
    ole.Fill(dataset); // Fill the dataset with the data from the adapter
    connection.Close(); // Close the connection
    connection.Dispose(); // Dispose of the connection
    ole.Dispose(); // Get rid of the adapter
    return dataset;
}
like image 793
Michael Born Avatar asked May 18 '11 21:05

Michael Born


People also ask

How do I convert a tab separated text to CSV?

Again, click the File tab in the Ribbon menu and select the Save As option. In the Save As window, select the CSV (Comma delimited) (*. csv) option in the Save as type drop-down menu. Type a name for the CSV file in the File name field, navigate to where you want to save the file, then click the Save button.

How do I read a tab-delimited text file in Python?

To read tab-separated values files with Python, we'll take advantage of the fact that they're similar to CSVs. We'll use Python's csv library and tell it to split things up with tabs instead of commas. Just set the delimiter argument to "\t" . That's it!

What is the most common delimiter used in delimited text files?

Delimited formats Any character may be used to separate the values, but the most common delimiters are the comma, tab, and colon.

Is tab-delimited the same as CSV?

A CSV (Comma Separated Values) or Tab-delimited Text (or Tab Separated Values) file is a text file in which one can identify rows and columns. Rows are represented by the lines in the file and the columns are created by separating the values on each line by a specific character, like a comma or a tab.


1 Answers

Use a stream approach with TextFieldParser - this way you will not load the whole file into memory in one go.

like image 95
Oded Avatar answered Sep 17 '22 13:09

Oded