Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex split while reading from file

I have a text file and I am reading it line by line.

I want to split a single line with ','.

But I want the commas which are inside quotes "" to be skipped.

I have tried following regex and it is not working correctly.

How to do it.

The contents of file are

"Mobile","Custom1","Custom2","Custom3","First Name"
"61402818083","service","in Portsmith","is","First Name"
"61402818083","service","in Parramatta Park","is","First Name"
"61402818083","services","in postcodes 3000, 4000","are","First Name"
"61402818083","services","in postcodes 3000, 4000, 5000","are","First Name"
"61402818083","services",,"are","First Name"

The regex is as follows

,(?=([^\"]*\"[^\"]*\")*[^\"]*$)

This regex is outputting the following for line 5

"61402818083"
,"First Name"
"services"
,"First Name"
"in postcodes 3000, 4000, 5000"
,"First Name"
"are"
"First Name"
"First Name"

The result should be as follows

"61402818083"
"services"
"in postcodes 3000, 4000, 5000"
"are"
"First Name"
like image 680
Fahad Abid Janjua Avatar asked Aug 04 '15 03:08

Fahad Abid Janjua


2 Answers

Don't reinvent the wheel. Seems that you're trying to parse a comma separated file (even if the file extension is different to csv). Try with this.

using (TextFieldParser reader = new TextFieldParser(@"c:\yourpath\file.csv"))
{
    reader.TextFieldType = FieldType.Delimited;
    reader.SetDelimiters(",");
    while (!reader.EndOfData) 
    {
        //Processing a line of the file
        string[] fields = reader.ReadFields();
        // now fields contains 5 elements, e.g.
        // fields[0] = "61402818083"
        // fields[1] = "services"
        // fields[2] = "in postcodes 3000, 4000, 5000"
        // fields[3] = "are"
        // fields[4] = "First Name"
    }
}

Note

It's required to add Microsoft.VisualBasic as reference in your project

like image 152
davcs86 Avatar answered Nov 16 '22 08:11

davcs86


using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string line = "\"61402818083\",\"services\",\"in postcodes 3000, 4000\",\"are\",\"First Name\"";
        var reg = new Regex("\".*?\"");
        var matches = reg.Matches(line);
        foreach (var item in matches)
        {
            Console.WriteLine(item.ToString());
        }
    }
}

OUTPUT:

"61402818083"
"services"
"in postcodes 3000, 4000"
"are"
"First Name"

https://dotnetfiddle.net/5GxxIo

One more possible solution:

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string line = "\"61402818083\",\"services\",\"in postcodes 3000, 4000\",\"are\",\"First Name\"";
        Console.WriteLine(line.ToString());
        var reg = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
        var matches = reg.Matches(line);
        foreach (Match match in reg.Matches(line))
        {
            Console.WriteLine(match.Value.TrimStart(','));
        }
    }
}

https://dotnetfiddle.net/rRml2D

like image 4
Jenish Rabadiya Avatar answered Nov 16 '22 06:11

Jenish Rabadiya