Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep double quotes in a text file using csv reader

Tags:

python

csv

reader

Hi I have a text file with string :

hello,"foo, bar"

i want to split it into a list as:

['hello', '"foo, bar"']

Is there a way I can acheive this ?

I am trying this as of now :

for line in sys.stdin: csv_file = StringIO.StringIO(line) csv_reader = csv.reader(csv_file)

I want them to split into two string i.e:

'hello' and '"foo, bar"'

like image 310
piyush121 Avatar asked Oct 30 '22 05:10

piyush121


2 Answers

Say you read a row from a CSV:

from StringIO import StringIO
import csv

infile = StringIO('hello,"foo, bar"')
reader = csv.reader(infile)
row = reader.next()  # row is ['hello', 'foo, bar']

The second value in the row is foo, bar instead of "foo, bar". This isn't some Python oddity, it's a reasonable interpretation of CSV syntax. The quotes probably weren't placed there to be part of a value, but rather to show that foo, bar is one value and shouldn't be split into foo and bar based on the comma (,). An alternative solution would be to escape the comma when creating the CSV file, so the line would look like:

hello,foo \,bar

So it's quite a strange request to want to keep those quotes. If we know more about your use case and the bigger picture we can help you better. What are you trying to achieve? Where does the input file come from? Is it really a CSV or is it some other syntax that looks similar? For example if you know that every line consists of two values separated by a comma, and the first value never contains a comma, then you can just split on the first comma:

print 'hello,"foo, bar"'.split(',', 1)  # => ['hello', '"foo, bar"']

But I doubt the input has such restrictions which is why things like quotes are needed to resolve ambiguities.

If you're trying to write to a CSV again, then the quotes will be recreated as you're doing so. They don't have to be there in the intermediate list:

outfile = StringIO()
writer = csv.writer(outfile)
writer.writerow(row)
print outfile.getvalue()

This will print

hello,"foo, bar"

You can customise the exact CSV output by setting a new dialect.

If you want to grab the individual values in the row with the appropriate quoting rules applied to them, it's possible, but it's a bit of a hack:

# We're going to write individual strings, so we don't want a line terminator
csv.register_dialect('no_line_terminator', lineterminator='')

def maybe_quote_string(s):
    out = StringIO()

    # writerow iterates over its argument, so don't give it a plain string
    # or it'll break it up into characters
    csv.writer(out, 'no_line_terminator').writerow([s])

    return out.getvalue()

print maybe_quote_string('foo, bar')
print map(maybe_quote_string, row)

The output is:

"foo, bar"
['hello', '"foo, bar"']

This is the closest I can come to answering your question. It's not really keeping the double quotes, rather it's removing them and adding them back with likely the same rules that put them there in the first place.

I'll say it again, you're probably headed down the wrong path with this question. Others will probably agree. That's why you're struggling to get good answers. What is the bigger problem that you're trying to solve? We can help you better to achieve that.

like image 104
Alex Hall Avatar answered Nov 15 '22 05:11

Alex Hall


Kinda depends upon you Use Case. If the only "s are there for values containing commas (e.g. "foo,bar"), then you can use CSV writer to put them back in.

import io
import csv

infile = io.StringIO('hello,"foo, bar"')
outfile = io.StringIO()
reader = csv.reader(infile)
for row in reader:
    inList = row
    break
print(inList)
# As an output string
writer = csv.writer(outfile)
writer.writerow(inList)
outList = outfile.getvalue().strip()
print(outList)
# As a List
outList = []
for i in range(len(inList)):
    outfile = io.StringIO()
    writer = csv.writer(outfile)
    writer.writerow([inList[i]])
    outList.append(outfile.getvalue().strip())
print(outList)

Output

['hello', 'foo, bar']
hello,"foo, bar"
['hello', '"foo, bar"']

However, if you have other, unnecessary "s that you want to preserve (e.g. '"hello","foo,bar",humbug') and all fields containing , will be correctly wrapped in "s then you could split the line on the , and look for 'broken' fields (start with " but don't end with ")

line = '"hello","foo, bar",humbug'
fields = line.split(',')
print(fields)
values = []
i = 0
while i < len(fields):
    # If a field doesn't start with a ", or starts and ends with "s
    if (fields[i][0] != '"') or (fields[i][-1] == '"'):
        values.append(fields[i])        # It's a stand alone value
        i += 1
        continue
    value = fields[i]           # A value that has been split
    i += 1
    while i < len(fields):
        value += ',' + fields[i]
        i += 1
        if value[-1] == '"':     # The last part would have ended in a "
            break
    values.append(value)
print(values)

Output

['"hello"', '"foo', ' bar"', 'humbug']
['"hello"', '"foo, bar"', 'humbug']
like image 33
Russell McDonell Avatar answered Nov 15 '22 05:11

Russell McDonell