Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read CSV from within Zip File

Tags:

python

csv

zip

I have a directory of zip files (approximately 10,000 small files), within each is a CSV file I am trying to read and split into a number of different CSV files.

I managed to write the code to split the CSV files from a directory of CSVs, shown below, that reads the first atttribute of the CSV, and depending what it is write it to the relevent CSV.

import csv
import os
import sys
import re
import glob

reader = csv.reader(open("C:/Projects/test.csv", "rb"), delimiter=',', quotechar='"')
write10 = csv.writer(open('ouput10.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
write15 = csv.writer(open('ouput15.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)


headings10=["RECORD_IDENTIFIER","CUSTODIAN_NAME","LOCAL_CUSTODIAN_NAME","PROCESS_DATE","VOLUME_NUMBER","ENTRY_DATE","TIME_STAMP","VERSION","FILE_TYPE"]
write10.writerow(headings10)

headings15=["RECORD_IDENTIFIER","CHANGE_TYPE","PRO_ORDER","USRN","STREET_DESCRIPTION","LOCALITY_NAME","TOWN_NAME","ADMINSTRATIVE_AREA","LANGUAGE"]
write15.writerow(headings15)


for row in reader:
    type = row[0]
    if "10" in type:        
        write10.writerow(row)
    elif "15" in type:
        write15.writerow(row)

So I am now trying to read the Zip files rather than wasting time extracting them first.

This is what I have so far after following as many tutorials as I have found

import glob
import os
import csv
import zipfile
import StringIO

for name in glob.glob('C:/Projects/abase/*.zip'):
    base = os.path.basename(name)
    filename = os.path.splitext(base)[0]


datadirectory = 'C:/Projects/abase/'
dataFile = filename
archive = '.'.join([dataFile, 'zip'])
fullpath = ''.join([datadirectory, archive])
csv = '.'.join([dataFile, 'csv'])


filehandle = open(fullpath, 'rb')
zfile = zipfile.ZipFile(filehandle)
data = StringIO.StringIO(zfile.read(csv))
reader = csv.reader(data)

for row in reader:
    print row

However and error gets thrown

AttributeError: 'str' object has no attribute 'reader'

Hopefully someone can show me how to change my CSV reading code that works to read the Zip file.

Much appreciated

Tim

like image 955
tjmgis Avatar asked Feb 18 '12 18:02

tjmgis


1 Answers

Simple fix. You're overriding the csv module with your local csv variable. Just change the name of that variable:

import glob
import os
import csv
import zipfile
import StringIO

for name in glob.glob('C:/Projects/abase/*.zip'):
    base = os.path.basename(name)
    filename = os.path.splitext(base)[0]


    datadirectory = 'C:/Projects/abase/'
    dataFile = filename
    archive = '.'.join([dataFile, 'zip'])
    fullpath = ''.join([datadirectory, archive])
    csv_file = '.'.join([dataFile, 'csv']) #all fixed


    filehandle = open(fullpath, 'rb')
    zfile = zipfile.ZipFile(filehandle)
    data = StringIO.StringIO(zfile.read(csv_file)) #don't forget this line!
    reader = csv.reader(data)

    for row in reader:
        print row
like image 93
benesch Avatar answered Oct 05 '22 17:10

benesch