Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python ctypes and mutability

Tags:

python

ctypes

I noticed that passing Python objects to native code with ctypes can break mutability expectations.

For example, if I have a C function like:

int print_and_mutate(char *str)
{
    str[0] = 'X';
    return printf("%s\n", str);
}

and I call it like this:

from ctypes import *
lib = cdll.LoadLibrary("foo.so")

s = b"asdf"
lib.print_and_mutate(s)

The value of s changed, and is now b"Xsdf".

The Python docs say "You should be careful, however, not to pass them to functions expecting pointers to mutable memory.".

Is this only because it breaks expectations of which types are immutable, or can something else break as a result? In other words, if I go in with the clear understanding that my original bytes object will change, even though normally bytes are immutable, is that OK or will I get some kind of nasty surprise later if I don't use create_string_buffer like I'm supposed to?

like image 737
wrschneider Avatar asked Sep 19 '20 12:09

wrschneider


People also ask

What does ctypes do in Python?

ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

Does ctypes work with C++?

ctypes is the de facto standard library for interfacing with C/C++ from CPython, and it provides not only full access to the native C interface of most major operating systems (e.g., kernel32 on Windows, or libc on *nix), but also provides support for loading and interfacing with dynamic libraries, such as DLLs or ...


3 Answers

Python makes assumptions about immutable objects, so mutating them will definitely break things. Here's a concrete example:

>>> import ctypes as c
>>> x = b'abc'          # immutable string
>>> d = {x:123}         # Used as key in dictionary (keys must be hashable/immutable)
>>> d
{b'abc': 123}

Now build a ctypes mutable buffer to the immutable object. id(x) in CPython is the memory address of the Python object and sys.getsizeof() returns the size of that object. PyBytes objects have some overhead, but the end of the object has the bytes of the string.

>>> sys.getsizeof(x)
36
>>> px=(c.c_char*36).from_address(id(x))
>>> px.raw
b'\x02\x00\x00\x00\x00\x00\x00\x000\x8fq\x0b\xfc\x7f\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\xf0\x06\xe61\xeb\x00\x1b\xa9abc\x00'
>>> px.raw[-4:]  # last bytes of the object
b'abc\x00'
>>> px[-4]
b'a'
>>> px[-4] = b'y'  # Mutate the ctypes buffer, mutating the "immutable" string
>>> x              # Now it has a modified value.
b'ybc'

Now try to access the key in the dictionary. Keys are located in O(1) time using its hash, but the hash was on the original, "immutable" value so it is incorrect. The key can no longer be found by old or new value:

>>> d           # Note that dictionary key changed, too.
{b'ybc': 123}
>>> d[b'ybc']   # Try to access the key
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: b'ybc'
>>> d[b'abc']   # Maybe original key will work? It hashes same as the original...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: b'abc'
like image 188
Mark Tolonen Avatar answered Oct 23 '22 23:10

Mark Tolonen


Various objects are interned by CPython and reused. Examples are small integers (-5 to 127) but also short strings and some literals. This behaviour is entirely implementation defined and may freely change between releases. Changing such objects can trigger arbitrary behaviour, from nothing at all to entirely undefined behaviour.

That "original bytes object" is not yours, it is CPython's.

like image 2
MisterMiyagi Avatar answered Oct 24 '22 01:10

MisterMiyagi


It sounds like the closest you can get to UB in CPython.

While it may not be happening at the moment, CPython could give you a pointer to read-only memory and the program will segfault.

Further, CPython could be sharing the string or subslices with other objects, and you would be modifying all of them.

like image 2
Acorn Avatar answered Oct 23 '22 23:10

Acorn