I have an Excel file containing about 400 worksheets, 375 of which I need to save out as CSV files. I've tried a VBA solution, but Excel has issues just opening this workbook.
I've created a python script to do just that. However, it rapidly consumes all available memory and pretty much stops working after 25 sheets are exported. Does anybody have a suggestion on how I might improve this code?
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False)
tabnames = importedfile.get_sheet_names()
substring = "Keyword"
for num in tabnames:
if num.find(substring) > -1:
sheet=importedfile.get_sheet_by_name(num)
name = "C:/Users/User/Desktop/Test/" + num + ".csv"
with open(name, 'w', newline='') as file:
savefile = csv.writer(file)
for i in sheet.rows:
savefile.writerow([cell.value for cell in i])
file.close()
print(time.ctime())
Any help would be appreciated.
Thanks
EDIT: I'm using windows 7 and python 3.4.3. I'm also open to solutions in R, VBA, or SPSS.
Try using the read_only=True
property for load_workbook()
class, this causes the worksheets you get to be IterableWorksheet
, meaning you can only iterate over them: you cannot directly use column/row numbers to access cell values in it. This would provide near constant memory consumption
according to documentation .
Also, you do not need to close the file
; the with
statement would handle that for you.
Example -
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False)
tabnames = importedfile.get_sheet_names()
substring = "Keyword"
for num in tabnames:
if num.find(substring) > -1:
sheet=importedfile.get_sheet_by_name(num)
name = "C:/Users/User/Desktop/Test/" + num + ".csv"
with open(name, 'w', newline='') as file:
savefile = csv.writer(file)
for i in sheet.rows:
savefile.writerow([cell.value for cell in i])
print(time.ctime())
From Documentation -
Sometimes, you will need to open or write extremely large XLSX files, and the common routines in openpyxl won’t be able to handle that load. Fortunately, there are two modes that enable you to read and write unlimited amounts of data with (near) constant memory consumption.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With