Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

writing fixed width, space delimited CSV output in Python

I would like to write a fixed width, space delimited and minimally quoted CSV file using Python's csv writer. An example of the output:

item1           item2  
"next item1"    "next item2"
anotheritem1    anotheritem2  

If I use

writer.writerow( ("{0:15s}".format(item1), "{0:15s}".format(item2)) )
...

then, with the space delimiter, the formatting is broken as either quotes or escapes (depending on the csv.QUOTE_* constant) are added due to the trailing spaces of the items formatting:

"item1          " "item2          "
"next item1     " "next item2     "
"anotheritem1   " "anotheritem2   "

Of course, I could format everything myself:

writer.writerow( ("{0:15s}{1:15s}".format(item1, item2)) )

but then there is not much point in using the csv writer. Also, I would have to sort out manually those cases when the space is embedded in the items and quoting/escaping should be used. In other words, it seems I would need a (non-existing) "QUOTE_ABSOLUTELYMINIMAL" csv constant that would act as the "QUOTE_MINIMAL" one but would also ignore trailing spaces.

Is there a way to achieve the "QUOTE_ABSOLUTELYMINIMAL" behaviour or another way to get a fixed width, space delimited CSV output using Python's CSV module?

The reason why I want the fixed-width feature in a CSV file is a better readability. So it will be processed as CSV for both reading and writing but better readable due to the column structure. Reading is not a problem as the csv skipinitialspace option takes care of ignoring the extra spaces. To my surprise, writing seems to be a problem...

EDIT: I conclude it is impossible to achieve with the current csv plugin. It is not a built-in option and I cannot see any reasonable way how to achieve it manually as it seems there is no way to write extra delimiters by the Python's csv writer without quoting or escaping them. Thus, I will probably have to write my own csv writer.

like image 860
jvm Avatar asked Apr 12 '11 16:04

jvm


People also ask

How do you convert fixed-width to delimited in Python?

You can convert a fixed-width file to a CSV using Python pandas by reading the fixed-width file as a DataFrame df using pd. read('my_file. fwf') and writing the DataFrame to a CSV using df. to_csv('my_file.

What is fixed-width CSV?

Fixed-width is a file format where data is arranged in columns, but instead of those columns being delimited by a certain character (as they are in CSV) every row is the exact same length. The application reading the file must know how long each column is.

What is delimiter in CSV file Python?

delimiter specifies the character used to separate each field. The default is the comma ( ',' ). quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ).


1 Answers

The basic problem you are running into is that csv and fixed-format are basically opposing views of data storage. Making them work together is not a common practice. Also, if you only have quotes on the items with spaces in them, it will throw off the alignment on those rows:

testing     "rather hmm "
strange     "ways to    "
"store some " "csv data   "
testing     testing    

Reading that data back in results in wrong results as well:

'testing' 'rather hmm '
'strange' 'ways to    '
'store some ' 'csv data   '
'testing' 'testing' ''

Notice the extra field at the end of the last row. Given these problems, I would go with your example of

"item1          " "item2          "
"next item1     " "next item2     "
"anotheritem1   " "anotheritem2   "

which I find very readable, is easy to generate with the existing csv library, and gets correctly parsed when read back in. Here's the code I used to generate it:

import csv

class SpaceCsv(csv.Dialect):
    "csv format for exporting tables"
    delimiter = None
    doublequote = True
    escapechar = None
    lineterminator = '\n'
    quotechar = '"'
    skipinitialspace = True
    quoting = csv.QUOTE_MINIMAL
csv.register_dialect('space', SpaceCsv)

data = (
        ('testing    ', 'rather hmm '),
        ('strange    ', 'ways to    '),
        ('store some ', 'csv data   '),
        ('testing    ', 'testing    '),

temp = open(r'c:\tmp\fixed.csv', 'w')
writer = csv.writer(temp, dialect='space')
for row in data:
    writer.writerow(row)
temp.close()

You will, of course, need to have all your data padded to the same length, either before getting to the function that does all this, or in the function itself. Oh, and if you have numeric data you'll have to make padding allowances for that as well.

like image 189
Ethan Furman Avatar answered Sep 29 '22 14:09

Ethan Furman