Hi I have a text file with string :
hello,"foo, bar"
i want to split it into a list as:
['hello', '"foo, bar"']
Is there a way I can acheive this ?
I am trying this as of now :
for line in sys.stdin:
csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file)
I want them to split into two string i.e:
'hello' and '"foo, bar"'
Say you read a row from a CSV:
from StringIO import StringIO
import csv
infile = StringIO('hello,"foo, bar"')
reader = csv.reader(infile)
row = reader.next() # row is ['hello', 'foo, bar']
The second value in the row is foo, bar
instead of "foo, bar"
. This isn't some Python oddity, it's a reasonable interpretation of CSV syntax. The quotes probably weren't placed there to be part of a value, but rather to show that foo, bar
is one value and shouldn't be split into foo
and bar
based on the comma (,
). An alternative solution would be to escape the comma when creating the CSV file, so the line would look like:
hello,foo \,bar
So it's quite a strange request to want to keep those quotes. If we know more about your use case and the bigger picture we can help you better. What are you trying to achieve? Where does the input file come from? Is it really a CSV or is it some other syntax that looks similar? For example if you know that every line consists of two values separated by a comma, and the first value never contains a comma, then you can just split on the first comma:
print 'hello,"foo, bar"'.split(',', 1) # => ['hello', '"foo, bar"']
But I doubt the input has such restrictions which is why things like quotes are needed to resolve ambiguities.
If you're trying to write to a CSV again, then the quotes will be recreated as you're doing so. They don't have to be there in the intermediate list:
outfile = StringIO()
writer = csv.writer(outfile)
writer.writerow(row)
print outfile.getvalue()
This will print
hello,"foo, bar"
You can customise the exact CSV output by setting a new dialect.
If you want to grab the individual values in the row with the appropriate quoting rules applied to them, it's possible, but it's a bit of a hack:
# We're going to write individual strings, so we don't want a line terminator
csv.register_dialect('no_line_terminator', lineterminator='')
def maybe_quote_string(s):
out = StringIO()
# writerow iterates over its argument, so don't give it a plain string
# or it'll break it up into characters
csv.writer(out, 'no_line_terminator').writerow([s])
return out.getvalue()
print maybe_quote_string('foo, bar')
print map(maybe_quote_string, row)
The output is:
"foo, bar"
['hello', '"foo, bar"']
This is the closest I can come to answering your question. It's not really keeping the double quotes, rather it's removing them and adding them back with likely the same rules that put them there in the first place.
I'll say it again, you're probably headed down the wrong path with this question. Others will probably agree. That's why you're struggling to get good answers. What is the bigger problem that you're trying to solve? We can help you better to achieve that.
Kinda depends upon you Use Case. If the only "s are there for values containing commas (e.g. "foo,bar"), then you can use CSV writer to put them back in.
import io
import csv
infile = io.StringIO('hello,"foo, bar"')
outfile = io.StringIO()
reader = csv.reader(infile)
for row in reader:
inList = row
break
print(inList)
# As an output string
writer = csv.writer(outfile)
writer.writerow(inList)
outList = outfile.getvalue().strip()
print(outList)
# As a List
outList = []
for i in range(len(inList)):
outfile = io.StringIO()
writer = csv.writer(outfile)
writer.writerow([inList[i]])
outList.append(outfile.getvalue().strip())
print(outList)
Output
['hello', 'foo, bar']
hello,"foo, bar"
['hello', '"foo, bar"']
However, if you have other, unnecessary "s that you want to preserve (e.g. '"hello","foo,bar",humbug') and all fields containing , will be correctly wrapped in "s then you could split the line on the , and look for 'broken' fields (start with " but don't end with ")
line = '"hello","foo, bar",humbug'
fields = line.split(',')
print(fields)
values = []
i = 0
while i < len(fields):
# If a field doesn't start with a ", or starts and ends with "s
if (fields[i][0] != '"') or (fields[i][-1] == '"'):
values.append(fields[i]) # It's a stand alone value
i += 1
continue
value = fields[i] # A value that has been split
i += 1
while i < len(fields):
value += ',' + fields[i]
i += 1
if value[-1] == '"': # The last part would have ended in a "
break
values.append(value)
print(values)
Output
['"hello"', '"foo', ' bar"', 'humbug']
['"hello"', '"foo, bar"', 'humbug']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With