Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split large csv file and keep header in each part

Tags:

bash

split

csv

How to split a large csv file (~100GB) and preserve the header in each part ?

For instance

h1 h2
a  aa
b  bb

into

h1 h2
a  aa

and

h1 h2
b  bb
like image 837
echo Avatar asked May 23 '16 08:05

echo


People also ask

How do I handle a large CSV file?

So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.


1 Answers

First you need to separate the header and the content :

header=$(head -1 $file)
data=$(tail -n +2 $file)

Then you want to split the data

echo $data | split [options...] -

In the options you have to specify the size of the chunks and the pattern for the name of the resulting files. The trailing - must not be removed as it specifies split to read data from stdin.

Then you can insert the header at the top of each file

sed -i "1i$header" $splitOutputFile

You should obviously do that last part in a for loop, but its exact code will depend on the prefix chosen for the split operation.

like image 127
Aaron Avatar answered Oct 28 '22 07:10

Aaron