Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

csv.DictWriter delimiter set to space implies text in " "

Hi I noticed that while using DictWriter and delimiter=' ' instead of ',' the string are saved to file in "" while by use of comma without. How set things up to have strings without " "?

CODE

import csv
golds_two =[]
mydict ={}

with open ('fileA.csv','rU') as csvfile:
    wszystkie=csv.DictReader(csvfile,delimiter=',')
    for w in wszystkie:
        mydict[(w['URL']).split('/')[-1]]=w['MediaObject ID']


with open ('fileB.csv','rU') as csvfile:
    golds=csv.DictReader(csvfile,delimiter=';')
    for g in golds:
        g['MediaObject ID']=mydict[g['ID']]
        golds_two.append(g)


with open('fileC.csv','w') as F: 
        head_fields =golds_two[0].keys()
        head_fields.remove('ID')
        print head_fields
        head_fields=sorted(head_fields,reverse=True)
        csvdw = csv.DictWriter(F,delimiter=" ",fieldnames=head_fields)
        headers = dict( (n,n) for n in head_fields)

        for z in golds_two:
            z.pop('ID',None)
            print z
            csvdw.writerow(z)

fileA.csv:

MediaObject ID,URL
1152901,http://foo.bar/TRU716565.jpg
1152902,http://foo.bar/TRU716566.jpg

fileB.csv:

ID;Gold label
TRU716565.jpg;Identifable X
TRU716566.jpg;Non identfiable X

resulting fileC.csv:

1152901 "Identifable X"
1152902 "Non identfiable X"
like image 618
andilabs Avatar asked Aug 26 '13 10:08

andilabs


People also ask

What is the default delimiter for CSV writer?

delimiter specifies the character used to separate each field. The default is the comma ( ',' ). quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ).

What does CSV Quote_none do?

QUOTE_NONE ), the csv module uses the quotechar (which defaults to " ) to quote field. The following listing changes the quote character from double quote ( " ) to a single quote ( ' ). In this case, the csv module uses the single quote ( ' ) instead of ( " ) to quote fields containing quotechar or delimiter.


1 Answers

Normally, to suppress the quoting of data fields in the csv file all you need to do is add a quoting=csv.QUOTE_NONE keyword argument to your csv.DictWriter() constructor call.

However, just doing that in this case will cause an exception to be raised because some of the data fields contain the delimiter character you're trying to use, ' ', and a different escapechar was not also defined. This is mentioned in the documentation where it says:

If escapechar is not set, the writer will raise csv.Error if any characters that require escaping are encountered.

This all makes total sense if you think about it — a csv file would be unparsable (i.e. invalid) if the data in the fields themselves contained unescaped delimiter characters because there would no way to tell the data from the delimiters—they'd be non-identifiable—if the results looked like this:

1152901 Identifable X
1152902 Non identfiable X

So changing the delimiter to "\t" will get rid of the quotes (as you already discovered), or alternatively you could define an escapechar="\\" (or whatever) in the constructor call and leave the delimiter set to " " — because either there would no longer be a conflict between the file's delimiter and the contents of its data fields or there would be way to escape them when they're encountered.

If you really want the results to be the way you stated for some unknown reason, you can always write the file out manually, I suppose.

Update: Here's one possible workaround which changes any space characters in the data fields to '\xa0' which is a non-breaking space on my system but displays the same as a regular space character.

with open('fileC.csv', 'w') as F:
    head_fields = golds_two[0].keys()
    head_fields.remove('ID')
    print(head_fields)
    head_fields = sorted(head_fields, reverse=True)
    csvdw = csv.DictWriter(F, delimiter=" ", fieldnames=head_fields,
                           quoting=csv.QUOTE_NONE)
    #headers = dict((n, n) for n in head_fields)

    for z in golds_two:
        z.pop('ID', None)
        print(z)
        for k,v in z.iteritems():
            z[k] = v.replace(' ', '\xa0')  # Convert space chars to non-breaking.
        csvdw.writerow(z)
like image 135
martineau Avatar answered Dec 08 '22 10:12

martineau