Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine the number of interned strings in Python 2.7.5?

In an earlier version of Python (I don't remember which), calling gc.get_referrers on an arbitrary interned string could be used to obtain a reference to the interned dict, which could then be queried for its length.

But this is no longer working in Python 2.7.5: gc.get_referrers(...) no longer includes the interned dict in the list it returns.

Is there any other way, in Python 2.7.5, to determine the number of interned strings? If so, how?

like image 694
jchl Avatar asked Oct 14 '16 09:10

jchl


1 Answers

You can sort of do this, but all options are messy and full of caveats to the point of near-uselessness, so first, let's consider whether you really want to.

Interning a string doesn't prolong its lifetime. You don't have to worry about the interned dict growing forever, full of strings you don't need. Thus, string interning is unlikely to be an actual memory problem, and learning how many strings have been interned might be pretty useless.

If you still want to do this, let's go through your options.


The Right Way would probably be to use your own interning implementation... except that Python's lackluster weak reference support doesn't let you create weak references to strings. That means that if you try this approach, you're stuck either passing around your own weak-referenceable string wrappers or keeping interned strings alive forever. Both options are terrible.


There is actually a function that prints the information you're asking about... but it also de-interns everything. Its existence is an implementation detail, and it's only accessible through the C API, so we'll need to use ctypes.pythonapi to get at it.

import ctypes

_Py_ReleaseInternedStrings = ctypes.pythonapi._Py_ReleaseInternedStrings

_Py_ReleaseInternedStrings.argtypes = ()
_Py_ReleaseInternedStrings.restype = None

_Py_ReleaseInternedStrings()

Output:

releasing 3461 interned strings
total size of all interned strings: 33685/0 mortal/immortal

The total sizes listed are sums of string lengths, so they don't include object headers or null terminators.


You're probably not happy about having to release all interned strings every time you want to check how many there were. Unfortunately, Python doesn't expose the interned dict, even through the C API or through GC hooks. What else could you try? Well, moving on to even crazier options, there's the debugger.

ecatmur posted a crazy hack launching a GDB process in unattended mode and using a conditional breakpoint to get at errnomap, a very similar dict to the interned dict you'd like to access. This could be adapted to access the interned dict instead. It would be highly non-portable and extremely difficult to maintain.


Launching a debugger is also a terrible option. What else could you try? Well, you could always build your own custom build of Python. Download the source from python.org, add

PyObject *
AwfulHackToGetTheInternedDict(void)
{
    if (interned == NULL) {
        // No interned dict yet.
        Py_RETURN_NONE;
    }
    Py_INCREF(interned);
    return interned;
}

to Objects/stringobject.c, build, and install. You'll probably want to use a virtualenv to keep this separate from your normal Python interpreter. With this awful hack in place, you can do

import ctypes

AwfulHackToGetTheInternedDict = ctypes.pythonapi.AwfulHackToGetTheInternedDict

AwfulHackToGetTheInternedDict.argtypes = ()
AwfulHackToGetTheInternedDict.restype = ctypes.py_object

interned = AwfulHackToGetTheInternedDict()

to get the dict of all interned strings.


So, those are your options, or at least, the options I've thought of. I also tried forcing the GC to track a string and then interning it to make the interned dict visible through the GC, but calling PyObject_GC_Track on a string caused a fatal error, so that doesn't work.

like image 157
user2357112 supports Monica Avatar answered Sep 23 '22 20:09

user2357112 supports Monica