Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read the content of the string intern pool

I would like to enumerate the strings that are in the string intern pool.

That is to say, I want to get the list of all the instances s of string such that:

string.IsInterned(s) != null

Does anyone know if it's possible?

like image 797
Benoit Blanchon Avatar asked Mar 04 '14 12:03

Benoit Blanchon


People also ask

What is intern () in string class?

The method intern() creates an exact copy of a String object in the heap memory and stores it in the String constant pool. Note that, if another String with the same contents exists in the String constant pool, then a new object won't be created and the new reference will point to the other String.

Where is string intern pool?

The distinct values are stored in a string intern pool. The single copy of each string is called its intern and is typically looked up by a method of the string class, for example String. intern() in Java. All compile-time constant strings in Java are automatically interned using this method.

What is String intern () When and why should it be used?

String Interning is a method of storing only one copy of each distinct String Value, which must be immutable. By applying String. intern() on a couple of strings will ensure that all strings having the same contents share the same memory.


2 Answers

Thanks to the advice of @HansPassant, I managed to get the list of string literals in an assembly. Which is extremely close to what I originally wanted.

You need to use read assembly meta-data, and enumerate user-strings. This can be done with these three methods of IMetaDataImport:

[ComImport, Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IMetaDataImport
{
    void CloseEnum(IntPtr hEnum);

    uint GetUserString(uint stk, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] char[] szString, uint cchString, out uint pchString);

    uint EnumUserStrings(ref IntPtr phEnum, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)]uint[] rStrings, uint cmax, out uint pcStrings);

    // interface also contains 62 irrelevant methods
}

To get the instance of IMetaDataImport, you need to get a IMetaDataDispenser:

[ComImport, Guid("809C652E-7396-11D2-9771-00A0C9B4D50C")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[CoClass(typeof(CorMetaDataDispenser))]
interface IMetaDataDispenser
{
    uint OpenScope([MarshalAs(UnmanagedType.LPWStr)]string szScope, uint dwOpenFlags, ref Guid riid, [MarshalAs(UnmanagedType.Interface)] out object ppIUnk);

    // interface also contains 2 irrelevant methods
}

[ComImport, Guid("E5CB7A31-7512-11D2-89CE-0080C792E5D8")]
class CorMetaDataDispenser
{
}

Here is how it goes:

var dispenser = new IMetaDataDispenser();
var metaDataImportGuid = new Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44");

object scope;
var hr = dispenser.OpenScope(location, 0, ref metaDataImportGuid, out scope);

metaDataImport = (IMetaDataImport)scope;    

where location is the path to the assembly file.

After that, calling EnumUserStrings() and GetUserString() is straighforward.

Here is a blog post with more detail, and a demo project on GitHub.

like image 73
Benoit Blanchon Avatar answered Oct 18 '22 05:10

Benoit Blanchon


The SSCLI function that its pointing to is

STRINGREF*AppDomainStringLiteralMap::GetStringLiteral(EEStringData *pStringData) 
{ 
    ... 
    DWORD dwHash = m_StringToEntryHashTable->GetHash(pStringData);
    if (m_StringToEntryHashTable->GetValue(pStringData, &Data, dwHash))
    {
        STRINGREF *pStrObj = NULL;
        pStrObj = ((StringLiteralEntry*)Data)->GetStringObject();
        _ASSERTE(!bAddIfNotFound || pStrObj);
        return pStrObj;
    }
    else { ... }

    return NULL; //Here, if this returns, the string is not interned
}

If you manage to find the native address of m_StringToEntryHashTable, you can enumerate the strings that exist.

like image 26
fiinix Avatar answered Oct 18 '22 05:10

fiinix