How to parse a CSV file in Bash?

Parsing CSV files under `bash`, using loadable module

Conforming to RFC 4180, a string like this sample CSV row:

12,22.45,"Hello, ""man"".","A, b.",42

should be splitted as

 1  12
 2  22.45
 3  Hello, "man".
 4  A, b.
 5  42

bash loadable .C compiled modules.

Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)

Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:

accept        listen for and accept a remote network connection on a given port
asort         Sort arrays in-place
basename      Return non-directory portion of pathname.
cat           cat(1) replacement with no options - the way cat was intended.
csv           process one line of csv data and populate an indexed array.
dirname       Return directory portion of pathname.
fdflags       Change the flag associated with one of bash's open file descriptors.
finfo         Print file info.
head          Copy first part of files.
hello         Obligatory "Hello World" / sample loadable.
...
tee           Duplicate standard input.
template      Example template for loadable builtin.
truefalse     True and false builtins.
tty           Return terminal name.
uname         Print system information.
unlink        Remove a directory entry.
whoami        Print out username of current user.

There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!

Under Debian GNU/Linux based system, you may have to install bash-builtins package by

apt install bash-builtins

Using loadable bash-builtins:

Then:

enable -f /usr/lib/bash/csv csv

From there, you could use csv as a bash builtin.

With my sample: 12,22.45,"Hello, ""man"".","A, b.",42

csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[@]}" | cat -n
     1      12
     2      22.45
     3      Hello, "man".
     4      A, b.
     5      42

Then in a loop, processing a file.

while IFS= read -r line;do
    csv -a aVar "$line"
    printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv

This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.

Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable, this may not work...

Complete sample with multiline CSV fields.

Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.

Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21

And a small script able to parse this file correctly:

#!/bin/bash

enable -f /usr/lib/bash/csv csv

file="sample.csv"
exec {FD}<"$file"

read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[@]}"

while read -ru $FD line;do
    while csv -a row "$line" ; ((${#row[@]}<${#headline[@]})) ;do
        read -ru $FD sline || break
        line+=$'\n'"$sline"
    done
    printf "$fieldfmt\\n" "${row[@]}"
done

This my render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')

Id      : "1234"
Name    : "Cpt1023"
Desc    : "Energy\ counter"
Value   : "34213"

Id      : "2343"
Name    : "Sns2123"
Desc    : "$'Temperatur sensor\nto trigg for alarm'"
Value   : "48.4"

Id      : "42"
Name    : "Eye1412"
Desc    : "$'Solar sensor "Day /\nNight"'"
Value   : "12199.21"

You could find a full working sample there: csvsample.sh.txt or csvsample.sh.

Warning:

Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!

Read carefully csv.c source code comments and RFC 4180!

In addition to the answer from @Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:

{
  read
  while IFS=, read -r col1 col2
  do
    echo "I got:$col1|$col2"
  done 
} < myfile.csv

Related questions
                            
                                Spinlock versus Semaphore
                            
                                Get specific line from text file using just shell script
                            
                                How to delete selected text in the vi editor
                            
                                How can I find the version of the Fedora I use?
                            
                                Attach to a processes output for viewing
                            
                                Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"
                            
                                Two versions of python on linux. how to make 2.7 the default
                            
                                Cannot kill Python script with Ctrl-C
                            
                                Copy Files from Windows to Windows Subsystem for Linux (WSL) [closed]
                            
                                Sending a mail from a linux shell script
                            
                                Shell - Write variable contents to a file
                            
                                How can I create directory tree in C++/Linux?
                            
                                How to open a new tab in GNOME Terminal from command line? [closed]
                            
                                What does "&" at the end of a linux command mean?
                            
                                How can I put the current running linux process in background? [closed]
                            
                                Can I use GDB to debug a running process?
                            
                                How to create a directory and give permission in single command
                            
                                Delete .DS_STORE files in current folder and all subfolders from command line on Mac
                            
                                What is the best way to ensure only one instance of a Bash script is running? [duplicate]
                            
                                Turn a simple socket into an SSL socket

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to parse a CSV file in Bash?

Tags:

linux

bash

csv

People also ask