Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a csv file one line at a time and replace/edit certain lines as you go?

Tags:

c#

.net

parsing

csv

I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.

How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?

like image 864
richard Avatar asked Dec 21 '12 07:12

richard


People also ask

Can you read and write to a CSV file at the same time?

You can do open("data. csv", "rw") , this allows you to read and write at the same time. So will this help me modify the data?

How do you write to a line by line in csv?

How do you write a line by line in csv? Use write() to write into a CSV file write(str) to write to file with str as the desired data. Each line should be separated by \n to write line by line.


2 Answers

The process would be something like this:

  1. Open a StreamWriter to a temporary file.
  2. Open a StreamReader to the target file.
  3. For each line:
    1. Split the text into columns based on a delimiter.
    2. Check the columns for the values you want to replace, and replace them.
    3. Join the column values back together using your delimiter.
    4. Write the line to the temporary file.
  4. When you are finished, delete the target file, and move the temporary file to the target file path.

Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.


Basic example just using StreamReader and StreamWriter:

var sourcePath = @"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;

var splitExpression = new Regex(@"(" + delimiter + @")(?=(?:[^""]|""[^""]*"")*$)");

using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
    string line = null;
    string[] headers = null;
    if (firstLineContainsHeaders)
    {
        line = reader.ReadLine();
        lineNumber++;

        if (string.IsNullOrEmpty(line)) return; // file is empty;

        headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();

        writer.WriteLine(line); // write the original header to the temp file.
    }

    while ((line = reader.ReadLine()) != null)
    {
        lineNumber++;

        var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();

        // if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
        if (headers == null) headers = new string[columns.Length];

        if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));

        // TODO: search and replace in columns
        // example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", @"\/");

        writer.WriteLine(string.Join(delimiter, columns));
    }

}

File.Delete(sourcePath);
File.Move(tempPath, sourcePath);
like image 90
moribvndvs Avatar answered Oct 11 '22 20:10

moribvndvs


memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files. read here http://msdn.microsoft.com/en-us/library/dd997372.aspx or google Memory-mapped files

like image 34
Junnan Wang Avatar answered Oct 11 '22 19:10

Junnan Wang