I'm trying to read in a text file that looks something like this:
Date, StartTime, EndTime
6/8/14, 1832, 1903
6/8/14, 1912, 1918
6/9/14, 1703, 1708
6/9/14, 1713, 1750
and this is what I have:
g = open('Observed_closure_info.txt', 'r')
closure_date=[]
closure_starttime=[]
closure_endtime=[]
file_data1 = g.readlines()
for line in file_data1[1:]:
data1=line.split(', ')
closure_date.append(str(data1[0]))
closure_starttime.append(str(data1[1]))
closure_endtime.append(str(data1[2]))
I did it this way for a previous file that was very similar to this one, and everything worked fine. However, this file isn't being read in properly. First it gives me an error "list index out of range" for closure_starttime.append(str(data1[1]))
and when I ask for it to print what it has for data1 or closure_date, it gives me something like
['\x006\x00/\x008\x00/\x001\x004\x00,\x00 \x001\x008\x003\x002\x00,\x00 \x001\x009\x000\x003\x00\r\x00\n']
I've tried rewriting the text file in case there was something corrupt about that particular file, and it still does the same thing. I'm not sure why because last time this worked fine.
Any suggestions? Thanks!
This looks like a comma-separated file with UTF-16 encoding (hence the \x00
null bytes). You'll have to decode the input from UTF-16, like so:
import codecs
closure_date=[]
closure_starttime=[]
closure_endtime=[]
with codecs.open('Observed_closure_info.txt', 'r', 'utf-16-le') as g:
g.next() # skip header line
for line in g:
date, start, end = line.strip().split(', ')
closure_date.append(date)
closure_starttime.append(start)
closure_endtime.append(end)
try this
g = open('Observed_closure_info.txt', 'r')
closure_date=[]
closure_starttime=[]
closure_endtime=[]
file_data1 = g.readlines()
for line in file_data1[1:]:
data1=line.decode('utf-16').split(',')
closure_date.append(str(data1[0]))
closure_starttime.append(str(data1[1]))
closure_endtime.append(str(data1[2]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With