I have an Excel spreadsheet that has a field containing small %f.2 values, such as 1.2, 1.07, 2.3, etc and for some reason openpyxl is reading these cells as a 1900 date. I have seen a variety of times this issue has been raised, but usually these users are expecting a date and are getting a bogus date. I am expecting a value, usually x<10.0 and i am getting about 30-40% 'bad' data (read as timedate), while the other time this is read as a numeric value.
I am using iterators, so I do a simple ws.iter_rows() call to pull the data one row at a time. I have tried to 'cast' this to a previously created variable containing a numeric value, but this doesn't do much good.
Does anyone have a suggestion on how to overcome this sporadic issue. If this is a known bug, are there any known workarounds?
I find that if I store the file as a csv, and re-open it as csv, then re-store it as xlsx I will end up with a file that I can read correctly. While this helps debug the code, I need a solution that my customer can use without jumping through these hoops.
I would think that if the column was not formatted correctly it would apply to all elements, so having this occur intermittently is confusing.
import openpyxl
from openpyxl import load_workbook
# Source workbook - wb
wb = load_workbook(filename = r'C:\data\TEST.xlsx' , use_iterators = True)
ws = wb.get_sheet_by_name(name ='QuoteFile ')
for row in ws.iter_rows():
print(row[0].internal_value ,row[3].internal_value ,row[4].internal_value ,row[5].internal_value)
print('Done')
Here is my input as seen from an excel table
20015 2.13 1.2 08/01/11
20015 5.03 1.2 08/01/11
20015 5.03 1.2 08/01/11
20015 5.51 1.2 08/01/11
20015 8.13 1.2 08/01/11
20015 5.60 1.2 08/01/11
20015 5.03 1.2 08/01/11
20015 1.50 1.2 08/01/11
20015 1.50 1.2 08/01/11
20015 1.50 1.2 08/01/11
20015 1.50 1.2 08/01/11
20015 1.50 1.2 08/01/11
20015 1.50 1.2 08/01/11
Here is my output, you can see the first seven rows indicate the second field as a date from 1900, while rows 8-13 show the field correctly as a numeric field:
20015.0 1900-01-02 03:07:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 12:14:24 1.2 2011-08-01 00:00:00
20015.0 1900-01-08 03:07:12 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 14:24:00 1.2 2011-08-01 00:00:00
20015.0 1900-01-05 00:43:12 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
20015.0 1.5 1.2 2011-08-01 00:00:00
using python 3.3 and openpyxl 1.6.2
Disclaimer: I have no idea how to work openpyxl. However, you mostly only need to worry about the datetime
module.
If you know which rows are supposed to be numbers, you could try code like this to convert an Excel date-format to a float, and ignore it if it's a number:
import datetime
import openpyxl
from openpyxl import load_workbook
# Source workbook - wb
wb = load_workbook(filename = r'C:\data\TEST.xlsx' , use_iterators=True)
ws = wb.get_sheet_by_name(name='QuoteFile ')
If val's a number, return it. Otherwise, take the difference between the datetime
and 1899-12-31 00:00:00. The way the datetimes work is they're internally a float,
being the number of days since the start of 1900. We get the number of seconds in
the delta (done through subtraction) and divide that by 86400 (the number of seconds
in a day).
def forcefloat(val):
"""If val's a number, return it. Otherwise, take the difference between the
datetime and 1899-12-31 00:00:00. The way the datetimes work is they're
internally a float, being the number of days since the start of 1900.
We get the number of seconds in the delta (done through subtraction)
and divide that by 86400 (the number of seconds in a day)."""
if isinstance(val, (int, float)):
return val
assert isinstance(val, datetime.datetime)
return (val - datetime.datetime(1899,12,31,0,0,0)).total_seconds() / 86400
for row in ws.iter_rows():
print(
row[0].internal_value,
forcefloat(row[3].internal_value),
row[4].internal_value,
row[5].internal_value,
)
print('Done')
Not exactly the most elegant solution, but it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With