I have a somewhat large .xlsx file - 19 columns, 5185 rows. I want to open the file, read all the values in one column, do some stuff to those values, and then create a new column in the same workbook and write out the modified values. Thus, I need to be able to both read and write in the same file.
My original code did this:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc)
ws = wb["Sheet1"]
# iterate through the columns to find the correct one
for col in ws.iter_cols(min_row=1, max_row=1):
for mycell in col:
if mycell.value == "PerceivedSound.RESP":
origCol = mycell.column
# get the column letter for the first empty column to output the new values
newCol = utils.get_column_letter(ws.max_column+1)
# iterate through the rows to get the value from the original column,
# do something to that value, and output it in the new column
for myrow in range(2, ws.max_row+1):
myrow = str(myrow)
# do some stuff to make the new value
cleanedResp = doStuff(ws[origCol + myrow].value)
ws[newCol + myrow] = cleanedResp
wb.save(doc)
However, python threw a memory error after row 3853 because the workbook was too big. The openpyxl docs said to use Read-only mode (https://openpyxl.readthedocs.io/en/latest/optimized.html) to handle big workbooks. I'm now trying to use that; however, there seems to be no way to iterate through the columns when I add the read_only = True param:
def readExcel(doc):
wb = load_workbook(generalpath + exppath + doc, read_only=True)
ws = wb["Sheet1"]
for col in ws.iter_cols(min_row=1, max_row=1):
#etc.
python throws this error: AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'
If I change the final line in the above snippet to:
for col in ws.columns:
python throws the same error: AttributeError: 'ReadOnlyWorksheet' object has no attribute 'columns'
Iterating over rows is fine (and is included in the documentation I linked above):
for col in ws.rows:
(no error)
This question asks about the AttritubeError but the solution is to remove Read-only mode, which doesn't work for me because openpyxl won't read my entire workbook in not Read-only mode.
So: how do I iterate through columns in a large workbook?
And I haven't yet encountered this, but I will once I can iterate through the columns: how do I both read and write the same workbook, if said workbook is large?
Thanks!
If the worksheet has only around 100,000 cells then you shouldn't have any memory problems. You should probably investigate this further.
iter_cols()
is not available in read-only mode because it requires constant and very inefficient reparsing of the underlying XML file. It is however, relatively easy to convert rows into columns from iter_rows()
using zip
.
def _iter_cols(self, min_col=None, max_col=None, min_row=None,
max_row=None, values_only=False):
yield from zip(*self.iter_rows(
min_row=min_row, max_row=max_row,
min_col=min_col, max_col=max_col, values_only=values_only))
import types
for sheet in workbook:
sheet.iter_cols = types.MethodType(_iter_cols, sheet)
According to the documentation, ReadOnly mode only supports row-based reads (column reads are not implemented). But that's not hard to solve:
wb2 = Workbook(write_only=True)
ws2 = wb2.create_sheet()
# find what column I need
colcounter = 0
for row in ws.rows:
for cell in row:
if cell.value == "PerceivedSound.RESP":
break
colcounter += 1
# cells are apparently linked to the parent workbook meta
# this will retain only values; you'll need custom
# row constructor if you want to retain more
row2 = [cell.value for cell in row]
ws2.append(row2) # preserve the first row in the new file
break # stop after first row
for row in ws.rows:
row2 = [cell.value for cell in row]
row2.append(doStuff(row2[colcounter]))
ws2.append(row2) # write a new row to the new wb
wb2.save('newfile.xlsx')
wb.close()
wb2.close()
# copy `newfile.xlsx` to `generalpath + exppath + doc`
# Either using os.system,subprocess.popen, or shutil.copy2()
You will not be able to write to the same workbook, but as shown above you can open a new workbook (in writeonly mode), write to it, and overwrite the old file using OS copy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With