Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split CSV files as per number of rows specified?

I've CSV file (around 10,000 rows ; each row having 300 columns) stored on LINUX server. I want to break this CSV file into 500 CSV files of 20 records each. (Each having same CSV header as present in original CSV)

Is there any linux command to help this conversion?

like image 968
Pawan Mude Avatar asked Dec 21 '13 16:12

Pawan Mude


People also ask

How does CSV separate rows?

A CSV file stores data in rows and the values in each row is separated with a separator, also known as a delimiter. Although the file is defined as Comma Separated Values, the delimiter could be anything. The most common delimiters are: a comma (,), a semicolon (;), a tab (\t), a space ( ) and a pipe (|).


3 Answers

Use the Linux split command:

split -l 20 file.txt new    

Split the file "file.txt" into files beginning with the name "new" each containing 20 lines of text each.

Type man split at the Unix prompt for more information. However you will have to first remove the header from file.txt (using the tail command, for example) and then add it back on to each of the split files.

like image 130
James King Avatar answered Oct 13 '22 22:10

James King


Made it into a function. You can now call splitCsv <Filename> [chunkSize]

splitCsv() {
    HEADER=$(head -1 $1)
    if [ -n "$2" ]; then
        CHUNK=$2
    else 
        CHUNK=1000
    fi
    tail -n +2 $1 | split -l $CHUNK - $1_split_
    for i in $1_split_*; do
        sed -i -e "1i$HEADER" "$i"
    done
}

Found on: http://edmondscommerce.github.io/linux/linux-split-file-eg-csv-and-keep-header-row.html

like image 98
Martin Dinov Avatar answered Oct 13 '22 23:10

Martin Dinov


This should work !!!

file_name = Name of the file you want to split.
10000 = Number of rows each split file would contain
file_part_ = Prefix of split file name (file_part_0,file_part_1,file_part_2..etc goes on)

split -d -l 10000 file_name.csv file_part_

like image 27
Soumyaansh Avatar answered Oct 13 '22 22:10

Soumyaansh