I have a native library for which a natural interface would involve passing potentially large numbers. I anticipate about half being < 32 bits; another quarter < 64 bits; the next eighth < 128 bits - and so on, without a fixed length limit.
PyLong_FromUnsignedLongLong() and PyLong_AsUnsignedLongLong() would be suitable if I could constrain values to fit in a single register.
PyLong_FromString() overcomes this - but at the undesirable expense of requiring an intermediate representation. _PyLong_FromByteArray() and _PyLong_AsByteArray() mitigate this cost (by making this intermediate representation simple) but the leading underscore makes me wonder if this may lead to portability problems.
In longintrepr.h, I've found struct _longobject... which hints that it might be a way to interact directly with the internal representation... though an absence of detailed documentation about this structure remains a hurdle.
What approach will result in optimal throughput between Python and the library? Is there documentation I've overlooked?
The underscore prefix largely means the same thing in the C API as in normal Python: "this function is an implementation detail subject to change, so watch yourself if you use it". You're not forbidden to use such functions, and if it's the only way to achieve a particular goal (e.g. significant efficiency gains in your case), then it's fine to use the API as long as you are aware of the hazard.
If the _PyLong_FromByteArray
API was truly private, it would be a static
function and wouldn't be fully documented and exported in longobject.h
. In fact, Tim Peters (a well-known Python core developer) explicitly blesses its use:
[Dan Christensen]
My student and I are writing a C extension that produces a large integer in binary which we'd like to convert to a python long. The number of bits can be a lot more than 32 or even 64. My student found the function _PyLong_FromByteArray in longobject.h which is exactly what we need, but the leading underscore makes me wary. Is it safe to use this function?
Python uses it internally, so it better be ;-)
Will it continue to exist in future versions of python?
No guarantees, and that's why it has a leading underscore: it's not an officially supported, externally documented, part of the advertised Python/C API. It so happens that I added that function, because Python needed some form of its functionality internally across different C modules. Making it an official part of the Python/C API would have been a lot more work (which I didn't have time for), and created an eternal new maintenance burden (which I'm not keen on regardless ;-)).
In practice, few people touch this part of Python's implementation, so I don't /expect/ it will go away, or even change, for years to come. The biggest insecurity I can think of offhand is that someone may launch a crusade to make some other byte-array <-> long interface "official" based on a different way of representing negative integers. But even then I expect the current unofficial functions to remain, since the 256's-complement representation remains necessary for the
struct
module's "q" format, and for thepickle
module's protocol=2 long serialization format.Or is there some other method we should use?
No. That's why these functions were invented to begin with ;-)
Here's the documentation (from Python 3.2.1):
/* _PyLong_FromByteArray: View the n unsigned bytes as a binary integer in
base 256, and return a Python long with the same numeric value.
If n is 0, the integer is 0. Else:
If little_endian is 1/true, bytes[n-1] is the MSB and bytes[0] the LSB;
else (little_endian is 0/false) bytes[0] is the MSB and bytes[n-1] the
LSB.
If is_signed is 0/false, view the bytes as a non-negative integer.
If is_signed is 1/true, view the bytes as a 2's-complement integer,
non-negative if bit 0x80 of the MSB is clear, negative if set.
Error returns:
+ Return NULL with the appropriate exception set if there's not
enough memory to create the Python long.
*/
PyAPI_FUNC(PyObject *) _PyLong_FromByteArray(
const unsigned char* bytes, size_t n,
int little_endian, int is_signed);
The main reason it's an "underscore-prefixed" API is because it depends on the implementation of the Python long
as an array of words in a power-of-two base. This isn't likely to change, but since you're implementing an API on top of this, you can insulate your callers from changes in the Python API later on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With