Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python inserts a colon in a Decimal number from Access via pyodbc

I'm having the same problem as this guy and possibly this guy, but I am around to share some code and respond to questions!

I have some code in a batch job that reads fields from a Microsoft Access database via pyodbc and prepares the output for display.

Here is a snippet. Note the assert.

def format_currency(amount):
    if amount is None:
        return ""
    else:
        result = "$%.2f" % amount
        assert ":" not in result, (
            "That's weird. The value %r of class %s is represented as %s" %
             (amount, amount.__class__, result))
        return result

When I run it, it successfully processes 100,000 rows and then fails:

AssertionError: That's weird. The value Decimal('54871.0000') of class <class
'decimal.Decimal'> is represented as $54870.:0

Note the aberrant colon. It occurs rarely - about one time in 300,000 records.

When I try to isolate it, of course it works.

from decimal import Decimal
print "$%.2f" % Decimal('54871.0000')

$54871.00

The type of the field in Access is:

  • Data Type: Currency
  • Decimal Places: 2
  • Input Mask:
  • Default Value:
  • Validation Rule:
  • Text Align: General

My vague finger-pointing suspicion based on insufficient evidence: pyodbc is poking with the internals of Decimal, perhaps confused by an Access corruption. As @ecatmur points out:

':' is '9' + 1 in ASCII

Anyone seen this and solved it?

Versions:

  • Python 2.7.4
  • pyodbc 3.0.6 (latest)
  • Access 2010
  • Windows 7

Digging further:

The decimal module is implemented in Python. From my reading, the values are described by four attributes: _exp, _int, _sign, _is_special

Suspecting corruption, I printed out the values of these fields.

Surprisingly, for both the faulty and the working version, I get:

_exp: -4
_int: 548710000
_sign: 0
_is_special: False

That's weird.


In the decimal module, the __float__ function is defined fairly simply:

def __float__(self):
    """Float representation."""
    return float(str(self))

But when I do this with the bad data:

print "Str", str(amount)
print "Float", float(amount)

I get:

Str 54871.0000

Float 54870.:

The more I learn, the less weird it doesn't get.

like image 201
Oddthinking Avatar asked Nov 12 '22 05:11

Oddthinking


1 Answers

I was able to reproduce the error. I created an Access table [pyData]...

ID - AutoNumber
Amount - Currency (2 decimal places)

...and filled it with a million rows of random values between 50,000 and 60,000. When I ran my test script it failed here

30815 : $50638.91
30816 : $52423.28
30817 :

Traceback (most recent call last):
  File "C:\__tmp\pyOdbcTest.py", line 20, in <module>
    print row.ID, ":", format_currency(row.Amount)
  File "C:\__tmp\pyOdbcTest.py", line 10, in format_currency
    (amount, amount.__class__, result))
AssertionError: That's weird. The value Decimal('58510.0000') of class <class 'decimal.Decimal'> is represented as $5850:.00

I also tested that value (58510.00) and the one that failed for you (54871.00) as single rows in a separate table with the same structure, and they both failed. So we know that it's not a function of some leftover "junk" from an earlier ODBC call.

Thinking that it might be related to the number having a '1' followed by zeroes to the end of the number, I tried 55871.00, but that worked fine. 53871.00 worked fine, too. Changing the number back to 54871.00 revived the error.

I tried the same test using pypyodbc and got the same error. I was somewhat optimistic because pypyodbc includes a number of Access-specific features, so I thought that one of its users may have encountered this problem before, but apparently not.

Finally, I upsized my test table to SQL Server 2008 R2 Express and tried the same test using the {SQL Server Native Client 10.0} driver. The numbers that failed when read from Access ("Currency" column type) did not fail when read from the SQL Server table ("money" column type).

So, the best I can offer for an "answer" at the moment is:

It looks like it's either:

  • a bug in pyodbc (and pypyodbc, which appears to be quite closely related to pyodbc), or

  • a bug in the Microsoft Access ODBC Driver, or

  • an "unfortunate interaction" between the two (if the ODBC spec is loose enough that neither component is technically "wrong").

In any case it looks like you'll need to work around it, at least for now.

Edit

Since I had that big batch of numbers I decided to let the script keep running and see what other numbers might get formatted with a colon in them. The resulting list all seemed to be whole numbers (no pennies), so I ran another test with whole numbers between 1 and 100,000. I found 260 numbers that wound up with a colon in the formatted string:

1451.0000 -> $1450.:0
1701.0000 -> $1700.:0
1821.0000 -> $1820.:0
1951.0000 -> $1950.:0
2091.0000 -> $2090.:0
...
98621.0000 -> $98620.:0
98710.0000 -> $9870:.00
99871.0000 -> $99870.:0

I pasted the entire list here. Perhaps that might be helpful.

Edit - Problem Resolved (?)

My previous tests were run under Python version 2.7.3. I just upgraded Python to version 2.7.5 (Win 32-bit) with pyodbc still at version 3.0.6 and the problem seems to have gone away.

like image 123
Gord Thompson Avatar answered Nov 14 '22 21:11

Gord Thompson