In an earlier version of Python (I don't remember which), calling gc.get_referrers
on an arbitrary interned string could be used to obtain a reference to the interned
dict, which could then be queried for its length.
But this is no longer working in Python 2.7.5: gc.get_referrers(...)
no longer includes the interned
dict in the list it returns.
Is there any other way, in Python 2.7.5, to determine the number of interned strings? If so, how?
You can sort of do this, but all options are messy and full of caveats to the point of near-uselessness, so first, let's consider whether you really want to.
Interning a string doesn't prolong its lifetime. You don't have to worry about the interned dict growing forever, full of strings you don't need. Thus, string interning is unlikely to be an actual memory problem, and learning how many strings have been interned might be pretty useless.
If you still want to do this, let's go through your options.
The Right Way would probably be to use your own interning implementation... except that Python's lackluster weak reference support doesn't let you create weak references to strings. That means that if you try this approach, you're stuck either passing around your own weak-referenceable string wrappers or keeping interned strings alive forever. Both options are terrible.
There is actually a function that prints the information you're asking about... but it also de-interns everything. Its existence is an implementation detail, and it's only accessible through the C API, so we'll need to use ctypes.pythonapi
to get at it.
import ctypes
_Py_ReleaseInternedStrings = ctypes.pythonapi._Py_ReleaseInternedStrings
_Py_ReleaseInternedStrings.argtypes = ()
_Py_ReleaseInternedStrings.restype = None
_Py_ReleaseInternedStrings()
Output:
releasing 3461 interned strings
total size of all interned strings: 33685/0 mortal/immortal
The total sizes listed are sums of string lengths, so they don't include object headers or null terminators.
You're probably not happy about having to release all interned strings every time you want to check how many there were. Unfortunately, Python doesn't expose the interned dict, even through the C API or through GC hooks. What else could you try? Well, moving on to even crazier options, there's the debugger.
ecatmur posted a crazy hack launching a GDB process in unattended mode and using a conditional breakpoint to get at errnomap
, a very similar dict to the interned
dict you'd like to access. This could be adapted to access the interned
dict instead. It would be highly non-portable and extremely difficult to maintain.
Launching a debugger is also a terrible option. What else could you try? Well, you could always build your own custom build of Python. Download the source from python.org, add
PyObject *
AwfulHackToGetTheInternedDict(void)
{
if (interned == NULL) {
// No interned dict yet.
Py_RETURN_NONE;
}
Py_INCREF(interned);
return interned;
}
to Objects/stringobject.c
, build, and install. You'll probably want to use a virtualenv to keep this separate from your normal Python interpreter. With this awful hack in place, you can do
import ctypes
AwfulHackToGetTheInternedDict = ctypes.pythonapi.AwfulHackToGetTheInternedDict
AwfulHackToGetTheInternedDict.argtypes = ()
AwfulHackToGetTheInternedDict.restype = ctypes.py_object
interned = AwfulHackToGetTheInternedDict()
to get the dict of all interned strings.
So, those are your options, or at least, the options I've thought of. I also tried forcing the GC to track a string and then interning it to make the interned dict visible through the GC, but calling PyObject_GC_Track
on a string caused a fatal error, so that doesn't work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With