Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a CSV file into multiple chunks and read those chunks in parallel in Java code

Tags:

java

csv

I have a very big CSV file (1GB+), it has 100,000 line.

I need to write a Java program to parse each line from the CSV file to create a body for a HTTP request to send out.

In other words, I need send out 100,000 HTTP requests which are corresponding to the lines in the CSV file. It will be very long if I do these in a single thread.

I'd like to create 1,000 threads to do i) read a line from the CSV file, ii) create a HTTP request whose body contains the read line's content, and iii) send the HTTP request out and receive response.

In this way, I need to split the CSV file into 1,000 chunks, and those chunks should have no overlapped lines in each other.

What's the best way to such a splitting procedure?

like image 411
JuliaLi Avatar asked Jun 19 '12 10:06

JuliaLi


People also ask

How do I split a CSV file into multiple files using command prompt?

In Terminal, navigate to the folder you just created using the 'cd' command, which stands for 'change directory. ' Now, you'll use the 'split' command to break the original file into smaller files.

What is CSV splitter?

CSV File Splitter is a lightweight application which splits huge comma separated values files into multiple smaller files, allowing further data analysis in Excel with its 1,048,576 row limit.


1 Answers

If you're looking to unzip and parse in the same operation, have a look at https://github.com/skjolber/unzip-csv.

like image 185
ThomasRS Avatar answered Sep 20 '22 04:09

ThomasRS