Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can i optimize the performance of this regular expression?

I'm using a regular expression to replace commas that are not contained by text qualifying quotes into tab spaces. I'm running the regex on file content through a script task in SSIS. The file content is over 6000 lines long. I saw an example of using a regex on file content that looked like this

String FileContent = ReadFile(FilePath, ErrInfo);        
Regex r = new Regex(@"(,)(?=(?:[^""]|""[^""]*"")*$)");
FileContent = r.Replace(FileContent, "\t");

That replace can understandably take its sweet time on a decent sized file.

Is there a more efficient way to run this regex? Would it be faster to read the file line by line and run the regex per line?

like image 407
topwik Avatar asked Jan 29 '26 04:01

topwik


2 Answers

It seems you're trying to convert comma separated values (CSV) into tab separated values (TSV).

In this case, you should try to find a CSV library instead and read the fields with that library (and convert them to TSV if necessary).

Alternatively, you can check whether each line has quotes and use a simpler method accordingly.

like image 144
Peter O. Avatar answered Jan 31 '26 18:01

Peter O.


The problem is the lookahead, which looks all the way to the end on each comman, resulting in O(n2) complexity, which is noticeable on long inputs. You can get it done in a single pass by skipping over quotes while replacing:

Regex csvRegex = new Regex(@"
    (?<Quoted>
        ""                  # Open quotes
        (?:[^""]|"""")*     # not quotes, or two quotes (escaped)
        ""                  # Closing quotes
    )
    |                       # OR
    (?<Comma>,)             # A comma
    ",
RegexOptions.IgnorePatternWhitespace);
content = csvRegex.Replace(content,
                        match => match.Groups["Comma"].Success ? "\t" : match.Value);

Here we match free command and quoted strings. The Replace method takes a callback with a condition that checks if we found a comma or not, and replaced accordingly.

like image 27
Kobi Avatar answered Jan 31 '26 19:01

Kobi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!