I have hundreds of XLSX files which all have columns containing long numeric account numbers. I need to automatically convert all of these files to CSV. This is trivial with tools like ssconvert. However, due to a bug feature in Excel and Libreoffice, long numeric fields will be displayed using scientific notation and this formatted number (not the underlying data) will be preserved if exported to CSV.
This means that any automated conversion to CSV will truncate account numbers, since the value 1240800388917 will be written to the CSV as 1.2408E+12 or 1240800000000, causing data corruption.
This is easy to fix by manually opening the Excel file and setting these columns to "text" format. However, it's a bit tedious to do this for hundreds of files, especially because many of these files have strange macros and formatting that make Libreoffice take several minutes to open each one (another reason why I'd like to convert them all to CSV in the first place).
What's the easiest way to use Python to automatically open each file and change an entire column's formatting to "text"? I see plenty of Python examples with how to read XLS/XLSX files, and in some cases write them, but I can find few guides on manipulating a column's default formatting.
Took me some trial and error and digging around in the code, but the solution turned out to be trivial.
from openpyxl import load_workbook
wb = load_workbook('myfile.xlsx')
ws = wb.active
for row in ws.rows:
row[col_index].number_format = row[col_index].style.number_format = '@'
wb.save('myfile-fixed.xlsx')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With