I'd like to distinguish between <code>None</code> and empty strings (<code>''</code>) when going back and forth between Python data structure and csv representation using Python's <code>csv</code> module. My issue is that when I run: <pre class="prettyprint"><code>import csv, cStringIO data = [['NULL/None value',None], ['empty string','']] f = cStringIO.StringIO() csv.writer(f).writerows(data) f = cStringIO.StringIO(f.getvalue()) data2 = [e for e in csv.reader(f)] print "input : ", data print "output: ", data2 </code></pre> I get the following output: <pre class="prettyprint lang-none prettyprint-override"><code>input : [['NULL/None value', None], ['empty string', '']] output: [['NULL/None value', ''], ['empty string', '']] </code></pre> Of course, I could play with <code>data</code> and <code>data2</code> to distinguish <code>None</code> and empty strings with things like: <pre class="prettyprint"><code>data = [d if d!=None else 'None' for d in data] data2 = [d if d!='None' else None for d in data2] </code></pre> But that would partly defeat my interest of the <code>csv</code> module (quick deserialization/serialization implemented in C, specially when you are dealing with large lists). Is there a <code>csv.Dialect</code> or parameters to <code>csv.writer</code> and <code>csv.reader</code> that would enable them to distinguish between <code>''</code> and <code>None</code> in this use-case? If not, would there be an interest in implementing a patch to <code>csv.writer</code> to enable this kind of back and forth? (Possibly a <code>Dialect.None_translate_to</code> parameter defaulting to <code>''</code> to ensure backward compatibility.)

You could at least partially side-step what the <code>csv</code> module does by creating your own version of a singleton <code>None</code>-like class/value: <pre class="prettyprint"><code>from __future__ import print_function import csv class NONE(object): ''' None-like class. ''' def __repr__(self): # Method csv.writer class uses to write values. return 'NONE' # Unique string value to represent None. def __len__(self): # Method called to determine length and truthiness. return 0 NONE = NONE() # Singleton instance of the class. if __name__ == '__main__': try: from cStringIO import StringIO # Python 2. except ModuleNotFoundError: from io import StringIO # Python 3. data = [['None value', None], ['NONE value', NONE], ['empty string', '']] f = StringIO() csv.writer(f).writerows(data) f = StringIO(f.getvalue()) print(" input:", data) print("output:", [e for e in csv.reader(f)]) </code></pre> Results: <pre class="prettyprint lang-none prettyprint-override"><code> input: [['None value', None], ['NONE value', NONE], ['empty string', '']] output: [['None value', ''], ['NONE value', 'NONE'], ['empty string', '']] </code></pre> Using <code>NONE</code> instead of <code>None</code> would preserve enough information for you to be able to differentiate between it and any actual empty-string data values. <h3>Even better alternative…</h3> You could use the same approach to implement a pair of relatively lightweight <code>csv.reader</code> and <code>csv.writer</code> “proxy” classes — necessary since you can't actually subclass the built-in <code>csv</code> classes which are written in C — without introducing a lot of overhead (since the majority of the processing would still be performed by the underlying built-ins). This would make what goes on completely transparent since it's all encapsulated within the proxies. <pre class="prettyprint"><code>from __future__ import print_function import csv class csvProxyBase(object): _NONE = '<None>' # Unique value representing None. class csvWriter(csvProxyBase): def __init__(self, csvfile, *args, **kwrags): self.writer = csv.writer(csvfile, *args, **kwrags) def writerow(self, row): self.writer.writerow([self._NONE if val is None else val for val in row]) def writerows(self, rows): list(map(self.writerow, rows)) class csvReader(csvProxyBase): def __init__(self, csvfile, *args, **kwrags): self.reader = csv.reader(csvfile, *args, **kwrags) def __iter__(self): return self def __next__(self): return [None if val == self._NONE else val for val in next(self.reader)] next = __next__ # Python2.x compatibility. if __name__ == '__main__': try: from cStringIO import StringIO # Python 2. except ModuleNotFoundError: from io import StringIO # Python 3. data = [['None value', None], ['empty string', '']] f = StringIO() csvWriter(f).writerows(data) f = StringIO(f.getvalue()) print("input : ", data) print("ouput : ", [e for e in csvReader(f)]) </code></pre> Results: <pre class="prettyprint lang-none prettyprint-override"><code> input: [['None value', None], ['empty string', '']] output: [['None value', None], ['empty string', '']] </code></pre>

The documentation suggests that what you want is not possible: <blockquote> To make it as easy as possible to interface with modules which implement the DB API, the value None is written as the empty string. </blockquote> This is in the documentation for the <code>writer</code> class, suggesting it is true for all dialects and is an intrinsic limitation of the csv module. I for one would support changing this (along with various other limitations of the csv module), but it may be that people would want to offload this sort of work into a different library, and keep the CSV module simple (or at least as simple as it is). If you need more powerful file-reading capabilities, you might want to look at the CSV reading functions in numpy, scipy, and pandas, which as I recall have more options.

I don't think it would be possible to do what you want with a mere dialect, but you could write your own csv.reader/write subclass. On the other hand, I still think that is overkill for this use case. Even if you want to catch more than just <code>None</code>, you probably just want <code>str()</code>: <pre class="prettyprint"><code>>>> data = [['NULL/None value',None],['empty string','']] >>> i = cStringIO.StringIO() >>> csv.writer(i).writerows(map(str,row) for row in data) >>> print i.getvalue() NULL/None value,None empty string, </code></pre>

CSV reader behavior with None and empty string

Tags:

python

string

csv

nonetype

I'd like to distinguish between None and empty strings ('') when going back and forth between Python data structure and csv representation using Python's csv module.

My issue is that when I run:

import csv, cStringIO

data = [['NULL/None value',None],
        ['empty string','']]

f = cStringIO.StringIO()
csv.writer(f).writerows(data)

f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in csv.reader(f)]

print "input : ", data
print "output: ", data2

I get the following output:

input :  [['NULL/None value', None], ['empty string', '']]
output:  [['NULL/None value', ''], ['empty string', '']]

Of course, I could play with data and data2 to distinguish None and empty strings with things like:

data = [d if d!=None else 'None' for d in data]
data2 = [d if d!='None' else None for d in data2]

But that would partly defeat my interest of the csv module (quick deserialization/serialization implemented in C, specially when you are dealing with large lists).

Is there a csv.Dialect or parameters to csv.writer and csv.reader that would enable them to distinguish between '' and None in this use-case?

If not, would there be an interest in implementing a patch to csv.writer to enable this kind of back and forth? (Possibly a Dialect.None_translate_to parameter defaulting to '' to ensure backward compatibility.)

545

asked Jul 07 '12 22:07

user1509316

3 Answers

You could at least partially side-step what the csv module does by creating your own version of a singleton None-like class/value:

from __future__ import print_function import csv   class NONE(object):     ''' None-like class. '''     def __repr__(self): # Method csv.writer class uses to write values.         return 'NONE'   # Unique string value to represent None.     def __len__(self):  # Method called to determine length and truthiness.         return 0  NONE = NONE()  # Singleton instance of the class.   if __name__ == '__main__':      try:         from cStringIO import StringIO  # Python 2.     except ModuleNotFoundError:         from io import StringIO  # Python 3.      data = [['None value', None], ['NONE value', NONE], ['empty string', '']]     f = StringIO()     csv.writer(f).writerows(data)      f = StringIO(f.getvalue())     print(" input:", data)     print("output:", [e for e in csv.reader(f)])

Results:

 input: [['None value', None], ['NONE value', NONE],   ['empty string', '']] output: [['None value', ''],   ['NONE value', 'NONE'], ['empty string', '']]

Using NONE instead of None would preserve enough information for you to be able to differentiate between it and any actual empty-string data values.

Even better alternative…

You could use the same approach to implement a pair of relatively lightweight csv.reader and csv.writer “proxy” classes — necessary since you can't actually subclass the built-in csv classes which are written in C — without introducing a lot of overhead (since the majority of the processing would still be performed by the underlying built-ins). This would make what goes on completely transparent since it's all encapsulated within the proxies.

from __future__ import print_function import csv   class csvProxyBase(object): _NONE = '<None>'  # Unique value representing None.   class csvWriter(csvProxyBase):     def __init__(self, csvfile, *args, **kwrags):         self.writer = csv.writer(csvfile, *args, **kwrags)     def writerow(self, row):         self.writer.writerow([self._NONE if val is None else val for val in row])     def writerows(self, rows):         list(map(self.writerow, rows))   class csvReader(csvProxyBase):     def __init__(self, csvfile, *args, **kwrags):         self.reader = csv.reader(csvfile, *args, **kwrags)     def __iter__(self):         return self     def __next__(self):         return [None if val == self._NONE else val for val in next(self.reader)]     next = __next__  # Python2.x compatibility.   if __name__ == '__main__':      try:         from cStringIO import StringIO  # Python 2.     except ModuleNotFoundError:         from io import StringIO  # Python 3.      data = [['None value', None], ['empty string', '']]     f = StringIO()     csvWriter(f).writerows(data)      f = StringIO(f.getvalue())     print("input : ", data)     print("ouput : ", [e for e in csvReader(f)])

Results:

 input: [['None value', None], ['empty string', '']] output: [['None value', None], ['empty string', '']]

102

answered Sep 20 '22 12:09

martineau

The documentation suggests that what you want is not possible:

To make it as easy as possible to interface with modules which implement the DB API, the value None is written as the empty string.

This is in the documentation for the writer class, suggesting it is true for all dialects and is an intrinsic limitation of the csv module.

I for one would support changing this (along with various other limitations of the csv module), but it may be that people would want to offload this sort of work into a different library, and keep the CSV module simple (or at least as simple as it is).

If you need more powerful file-reading capabilities, you might want to look at the CSV reading functions in numpy, scipy, and pandas, which as I recall have more options.

answered Sep 18 '22 12:09

BrenBarn

I don't think it would be possible to do what you want with a mere dialect, but you could write your own csv.reader/write subclass. On the other hand, I still think that is overkill for this use case. Even if you want to catch more than just None, you probably just want str():

>>> data = [['NULL/None value',None],['empty string','']]
>>> i = cStringIO.StringIO()
>>> csv.writer(i).writerows(map(str,row) for row in data)
>>> print i.getvalue()
NULL/None value,None
empty string,

answered Sep 19 '22 12:09

kojiro

Related questions
                            
                                Testing Python Decorators?
                            
                                How to make an internal hyperlink in Sphinx documentation [duplicate]
                            
                                are user defined classes mutable
                            
                                Beginner Python: AttributeError: 'list' object has no attribute
                            
                                "Reduce" function for Series
                            
                                Python 3 urllib ignore SSL certificate verification
                            
                                pip install -r requirements.txt [Errno 2] No such file or directory: 'requirements.txt'
                            
                                Google Coding Challenge Question 2020 : Unspecified Words
                            
                                Python mechanize - two buttons of type 'submit'
                            
                                Matplotlib: simultaneous plotting in multiple threads
                            
                                TypeError: 'NoneType' object has no attribute '__getitem__'
                            
                                Difference between Kivy and PY4A
                            
                                Correct way to test for numpy.dtype
                            
                                Pandas ".convert_objects(convert_numeric=True)" deprecated [duplicate]
                            
                                "unstack" a pandas column containing lists into multiple rows [duplicate]
                            
                                Wrapping exceptions in Python
                            
                                Printing a utf-8 encoded string
                            
                                Apache SetEnv not working as expected with mod_wsgi
                            
                                Change Jupyter QtConsole settings
                            
                                How to pass member function as argument in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With