Rigorous definition for CSV file reading/writing

Tags:

I have written my own CSV reader/writer in C to store records in a character column in an ODBC database. Unfortunately I have discovered many edge cases that trip over my implementation, and I have come to the conclusion my problem is that I have not rigorously defined the rules for CSV. I've read RFC4180, but it seems incomplete and does not resolve ambiguities.

For example, should "" be considered an empty token or a double quote? Do quotes match outside-in or left to right? What do I do with an input string that has unmatched single quotes? The real mess begins when I have nested tokens, which doubles up the escaped quotation characters.

What I really need is a definitive CSV standard that I can implement in code. Every time I feel I have nailed every corner case, I find another one. I am sure this problem has been mulled over and solved many times over by superior minds to mine, has anyone written a rigorous definition of CSV that I can implement in code? I realise C is not the ideal language here, but I don't have a choice about the compiler at this stage; nor can I use a third party library (unless it compiles with C-90). Boost is not an option as my compiler doesn't support C++. I have contemplated ditching CSV for XML, but it seems like overkill for storing a few tokens in a 256 character database record. Anyone made a definitive CSV spec?

361

asked Jun 06 '13 03:06

Piers

1 Answers

There is no standard (see Wikipedia's article, in particular http://en.wikipedia.org/wiki/Comma-separated_values#Lack_of_a_standard), so in order to use CSV, you need to follow the general principle of being conservative in what you generate and liberal in what you accept. In particular:

Do not use quotation marks for blank fields. Simply write an empty field (two adjacent delimiters, or a delimiter in the first/last position of the line).
Quote any field containing a quotation mark, comma, or newline.

119

answered Sep 23 '22 05:09

R.. GitHub STOP HELPING ICE

Related questions
                            
                                calloc fails and returns NULL
                            
                                CUDA Primes Generation
                            
                                Capture C stderr from Java JNI
                            
                                Simple cache profiling API
                            
                                Detect library features at runtime in C
                            
                                Why is Windows's CreateFile(<no share access>) lying to me?
                            
                                Macro count params
                            
                                Why does getchar() recognize EOF only in the beginning of a line?
                            
                                How to keep parent and child process on same core
                            
                                Correct way to parse network packet in C
                            
                                Duplicated memory management symbols in libc.so and ld-linux.so
                            
                                Error while loading shared libraries: libcmocka.so.0: No such file or directory
                            
                                Can the object files output by gcc vary between compilations of the same source with the same options?
                            
                                Looking for basics of reading audio frequencies in C [closed]
                            
                                Source engine styled rope rendering
                            
                                How can I send data packets into the network without using sockets?
                            
                                Data structure options for efficiently storing sets of integer pairs on disk?
                            
                                Creating a PyCObject pointer in Cython
                            
                                How to trap file access attempts with a filter driver (kernel) and offer dialog to allow/deny (user)?
                            
                                Very Slow Random Number Generation using dev/urandom on Mac OS 10.8

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With