How to split CSV files as per number of rows specified?

Tags:

I've CSV file (around 10,000 rows ; each row having 300 columns) stored on LINUX server. I want to break this CSV file into 500 CSV files of 20 records each. (Each having same CSV header as present in original CSV)

Is there any linux command to help this conversion?

968

asked Dec 21 '13 16:12

Pawan Mude

3 Answers

Use the Linux split command:

split -l 20 file.txt new

Split the file "file.txt" into files beginning with the name "new" each containing 20 lines of text each.

Type man split at the Unix prompt for more information. However you will have to first remove the header from file.txt (using the tail command, for example) and then add it back on to each of the split files.

130

answered Oct 13 '22 22:10

James King

Made it into a function. You can now call splitCsv <Filename> [chunkSize]

splitCsv() {
    HEADER=$(head -1 $1)
    if [ -n "$2" ]; then
        CHUNK=$2
    else 
        CHUNK=1000
    fi
    tail -n +2 $1 | split -l $CHUNK - $1_split_
    for i in $1_split_*; do
        sed -i -e "1i$HEADER" "$i"
    done
}

Found on: http://edmondscommerce.github.io/linux/linux-split-file-eg-csv-and-keep-header-row.html

answered Oct 13 '22 23:10

Martin Dinov

This should work !!!

file_name = Name of the file you want to split.
10000 = Number of rows each split file would contain
file_part_ = Prefix of split file name (file_part_0,file_part_1,file_part_2..etc goes on)

split -d -l 10000 file_name.csv file_part_

answered Oct 13 '22 22:10

Soumyaansh

Related questions
                            
                                setup cron tab to specific time of during weekdays
                            
                                Linux Shell Script For Each File in a Directory Grab the filename and execute a program
                            
                                cut or awk command to print first field of first row
                            
                                How to get child process from parent process
                            
                                Running jmap getting Unable to open socket file
                            
                                Can't clone a github repo on Linux via HTTPS
                            
                                dup2 / dup - why would I need to duplicate a file descriptor?
                            
                                How can I determine whether a specific file is open in Windows? [closed]
                            
                                How to remove non UTF-8 characters from text file
                            
                                <random> generates same number in Linux, but not in Windows
                            
                                How to modify memory contents using GDB?
                            
                                Symbolic link to a hook in git
                            
                                How to get a list of programs running with nohup
                            
                                Compilation fails with "relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a shared object"
                            
                                Limitations of Intel Assembly Syntax Compared to AT&T [closed]
                            
                                Process list on Linux via Python
                            
                                Linux: Which process is causing "device busy" when doing umount? [closed]
                            
                                Gem Command not found
                            
                                Turning multiple lines into one comma separated line [duplicate]
                            
                                docker.errors.DockerException: Error while fetching server API version

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to split CSV files as per number of rows specified?

Tags:

linux

unix

split

csv