Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a csv file with quotes as text-delimiter using String.split()

Tags:

java

split

csv

I have a comma separated file with many lines similar to one below.

Sachin,,M,"Maths,Science,English",Need to improve in these subjects. 

Quotes is used to escape the delimiter comma used to represent multiple values.

Now how do I split the above value on the comma delimiter using String.split() if at all its possible?

like image 212
FarSh018 Avatar asked Apr 01 '13 06:04

FarSh018


People also ask

How do you split a string with double quotes?

split("(? =\"[^\"]. *\")");

How do I split a CSV file in command prompt?

In Terminal, navigate to the folder you just created using the 'cd' command, which stands for 'change directory. ' Now, you'll use the 'split' command to break the original file into smaller files.


2 Answers

public static void main(String[] args) {     String s = "Sachin,,M,\"Maths,Science,English\",Need to improve in these subjects.";     String[] splitted = s.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");     System.out.println(Arrays.toString(splitted)); } 

Output:

[Sachin, , M, "Maths,Science,English", Need to improve in these subjects.] 
like image 182
Achintya Jha Avatar answered Sep 22 '22 17:09

Achintya Jha


As your problem/requirements are not all that complex a custom method can be utilized that performs over 20 times faster and produces the same results. This is variable based on the data size and number of rows parsed, and for more complicated problems using regular expressions is a must.

import java.util.Arrays; import java.util.ArrayList; public class SplitTest {  public static void main(String[] args) {      String s = "Sachin,,M,\"Maths,Science,English\",Need to improve in these subjects.";     String[] splitted = null;   //Measure Regular Expression     long startTime = System.nanoTime();     for(int i=0; i<10; i++)     splitted = s.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");     long endTime =   System.nanoTime();      System.out.println("Took: " + (endTime-startTime));     System.out.println(Arrays.toString(splitted));     System.out.println("");       ArrayList<String> sw = null;          //Measure Custom Method             startTime = System.nanoTime();     for(int i=0; i<10; i++)     sw = customSplitSpecific(s);     endTime =   System.nanoTime();      System.out.println("Took: " + (endTime-startTime));     System.out.println(sw);          }  public static ArrayList<String> customSplitSpecific(String s) {     ArrayList<String> words = new ArrayList<String>();     boolean notInsideComma = true;     int start =0, end=0;     for(int i=0; i<s.length()-1; i++)     {         if(s.charAt(i)==',' && notInsideComma)         {             words.add(s.substring(start,i));             start = i+1;                         }            else if(s.charAt(i)=='"')         notInsideComma=!notInsideComma;     }     words.add(s.substring(start));     return words; }    

}

On my own computer this produces:

Took: 6651100 [Sachin, , M, "Maths,Science,English", Need to improve in these subjects.]  Took: 224179 [Sachin, , M, "Maths,Science,English", Need to improve in these subjects.] 
like image 29
Menelaos Avatar answered Sep 21 '22 17:09

Menelaos