Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python not properly reading in text file

I'm trying to read in a text file that looks something like this:

Date, StartTime, EndTime 
6/8/14, 1832, 1903
6/8/14, 1912, 1918
6/9/14, 1703, 1708
6/9/14, 1713, 1750

and this is what I have:

g = open('Observed_closure_info.txt', 'r')
closure_date=[]
closure_starttime=[]
closure_endtime=[]
file_data1 = g.readlines()
for line in file_data1[1:]:
    data1=line.split(', ')
    closure_date.append(str(data1[0]))
    closure_starttime.append(str(data1[1]))
    closure_endtime.append(str(data1[2]))

I did it this way for a previous file that was very similar to this one, and everything worked fine. However, this file isn't being read in properly. First it gives me an error "list index out of range" for closure_starttime.append(str(data1[1])) and when I ask for it to print what it has for data1 or closure_date, it gives me something like

['\x006\x00/\x008\x00/\x001\x004\x00,\x00 \x001\x008\x003\x002\x00,\x00 \x001\x009\x000\x003\x00\r\x00\n']

I've tried rewriting the text file in case there was something corrupt about that particular file, and it still does the same thing. I'm not sure why because last time this worked fine.

Any suggestions? Thanks!

like image 272
python_amateur Avatar asked Jun 24 '15 16:06

python_amateur


2 Answers

This looks like a comma-separated file with UTF-16 encoding (hence the \x00 null bytes). You'll have to decode the input from UTF-16, like so:

import codecs

closure_date=[]
closure_starttime=[]
closure_endtime=[]
with codecs.open('Observed_closure_info.txt', 'r', 'utf-16-le') as g:
    g.next() # skip header line
    for line in g:
        date, start, end = line.strip().split(', ')
        closure_date.append(date)
        closure_starttime.append(start)
        closure_endtime.append(end)
like image 100
nneonneo Avatar answered Sep 18 '22 02:09

nneonneo


try this

g = open('Observed_closure_info.txt', 'r')
closure_date=[]
closure_starttime=[]
closure_endtime=[]
file_data1 = g.readlines()
for line in file_data1[1:]:
    data1=line.decode('utf-16').split(',')
    closure_date.append(str(data1[0]))
    closure_starttime.append(str(data1[1]))
    closure_endtime.append(str(data1[2]))
like image 39
efirvida Avatar answered Sep 22 '22 02:09

efirvida