I have a file containing multiple entries. Each entry is of the following form:
"field1","field2","field3","field4","field5"
All of the fields are guaranteed to not contain any quotes, however they can contain ,
. The problem is that field4
can be split across multiple lines. So an example file can look like:
"john","male US","done","Some sample text
across multiple lines. There
can be many lines of this","foo bar baz"
"jane","female UK","done","fields can have , in them","abc xyz"
I want to extract the fields using Python. If the field would not have been split across multiple lines this would have been simple: Extract string from between quotations. But I can't seem to find a simple way to do this in presence of multiline fields.
EDIT: There are actually five fields. Sorry about the confusion if any. The question has been edited to reflect this.
Use the re. findall() method to extract strings between quotes, e.g. my_list = re. findall(r'"([^"]*)"', my_str) .
If you want to keep quotes around the quoted tokens, specify shlex. split(line, posix=False) .
We can extract strings in between the quotations using split() method and slicing.
index() to find where the quotes("") begin and end? temp. index('"') , or temp. index("\"") .
I think that the csv
module can solve this problem. It splits correctly with newlines:
import csv
f = open('infile', newline='')
reader = csv.reader(f)
for row in reader:
for field in row:
print('-- {}'.format(field))
It yields:
-- john
-- male US
-- done
-- Some sample text
across multiple lines. There
can be many lines of this
-- foo bar baz
-- jane
-- female UK
-- done
-- fields can have , in them
-- abc xyz
The answer from the question you linked worked for me:
import re
f = open("test.txt")
text = f.read()
string_list = re.findall('"([^"]*"', text)
At this point, string_list contains your strings. Now, these strings can have line breaks in them, but you can use
new_string = string_list.replace("\n", " ")
to clean that up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With