How to split a large csv file (~100GB) and preserve the header in each part ?
For instance
h1 h2
a aa
b bb
into
h1 h2
a aa
and
h1 h2
b bb
So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.
First you need to separate the header and the content :
header=$(head -1 $file)
data=$(tail -n +2 $file)
Then you want to split the data
echo $data | split [options...] -
In the options you have to specify the size of the chunks and the pattern for the name of the resulting files. The trailing -
must not be removed as it specifies split
to read data from stdin.
Then you can insert the header at the top of each file
sed -i "1i$header" $splitOutputFile
You should obviously do that last part in a for loop, but its exact code will depend on the prefix chosen for the split
operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With