I'm trying to move some of my processing work from R to Python. In R, I use read.table() to read REALLY messy CSV files and it automagically splits the records in the correct format. E.g.
391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>
<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>
","windows-7 printer hp"
is correctly separated into 4 columns. 1 record can be split over many lines and there are commas all over the place. In R I just do:
read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)
Is there something in Python that can do this equally well?
Thanks!
You can use csv module.
from csv import reader
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"")
for row in csv_reader:
print row
['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']
length of output = 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With