Pointer arithmetic in LLDB Python scripts

Question

I've been trying to create a custom data formatter for a custom string type in Xcode. The following code gets me the address of the first character in the string:

def MyStringSummary(valobj, internal_dict):
    data_pointer = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('Data')
    print data_pointer.GetValue()

That prints out the pointer address. When I look at the contents of that address I can see the wide chars used to store that data, so I guess what I have to do is cast this pointer to wchar_t and then I've got the first character. One of my first approaches was this:

if data_pointer.TypeIsPointerType():
    mychar = data_pointer.Dereference()
    print mychar.GetValue()
else:
    print "data_pointer is not a pointer!"

This confirmed that the data_pointer is a pointer, but the Dereference() call doesn't seem to resolve anything: mychar.GetValue() just returns None. Another issue - would I then be able to go through a loop and increase the address of data_pointer by a fixed amount each time and keep dereferencing and finding the next character, then adding it to the output string? If so, how would I do this?

EDIT:

To help clarify the problem, I'll post some info about the underlying data structure of the string. The definition is too long to post here (also it inherits most of what it does from a generic array base class) but I'll give some more details.

When looking at the StringVar.AllocationInstance.Data pointer location I can see that we're using 16 bits for each character. All of the characters in the string I'm looking at are only 8 bits, with another 8 bits of 0 after each character. So, this is what happens when I do this in the debugger:

(lldb) p (char*)(StringVar.AllocatorInstance.Data)
(char *) $4 = 0x10653360 "P"
(lldb) p (char*)(StringVar.AllocatorInstance.Data)+1
(char *) $6 = 0x10653361 ""
(lldb) p (char*)(StringVar.AllocatorInstance.Data)+2
(char *) $7 = 0x10653362 "a"

So I assume the reason it's only showing one character at a time is because it thinks each 8-bit character is null-terminated by the following 8 bits. However, when I cast to unsigned short I get this:

(lldb) p (unsigned short*)(StringVar.AllocatorInstance.Data)
(unsigned short *) $9 = 0x10653360
(lldb) p *(unsigned short*)(StringVar.AllocatorInstance.Data)
(wchar_t) $10 = 80
(lldb) p (char*)(unsigned short*)(StringVar.AllocatorInstance.Data)
(char *) $11 = 0x10653360 "P"
(lldb) p (char*)((unsigned short*)(StringVar.AllocatorInstance.Data)+1)
(char *) $14 = 0x10653362 "a"
(lldb) p (char*)((unsigned short*)(StringVar.AllocatorInstance.Data)+2)
(char *) $18 = 0x10653364 "r"

...so it looks like the cast to unsigned short is fine, as long as we cast each integer to a char. Any idea how I might try to put this in a Python data formatter?

Jason Molenda · Accepted Answer

Your Data looks like it is probably UTF-16. I made a quick C program that looks kind of like your question description and played around a little in the interactive Python interpreter. I think this might be enough to point you in the right direction for writing your own formatter?

int main ()
{
    struct String *mystr = AllocateString();
    mystr->AllocatorInstance.len = 10;
    mystr->AllocatorInstance.Data = (void *) malloc (10);
    memset (mystr->AllocatorInstance.Data, 0, 10);
    ((char *)mystr->AllocatorInstance.Data)[0] = 'h';
    ((char *)mystr->AllocatorInstance.Data)[2] = 'e';
    ((char *)mystr->AllocatorInstance.Data)[4] = 'l';
    ((char *)mystr->AllocatorInstance.Data)[6] = 'l';
    ((char *)mystr->AllocatorInstance.Data)[8] = 'o';

    FreeString (mystr);
}

Using the lldb.frame, lldb.process shortcuts (only valid when doing interactive script), we can read the Data into a python string buffer easily:

>>> valobj = lldb.frame.FindVariable("mystr")
>>> address = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('Data').GetValueAsUnsigned()
>>> size = valobj.GetChildMemberWithName('AllocatorInstance').GetChildMemberWithName('len').GetValueAsUnsigned()
>>> print address
4296016096
>>> print size
10
>>> err = lldb.SBError()
>>> print err
error: <NULL>
>>> membuf = lldb.process.ReadMemory (address, size, err)
>>> print err
success
>>> membuf
'h\x00e\x00l\x00l\x00o\x00'

From this point you can do any of the usual python array type things -

>>> for b in membuf:
...   print ord(b)
... 
104
0
101
0
108
0
108
0
111
0

I'm not sure how you can tell Python that this is UTF-16 and should be internalized correctly as wide-chars, that's more a Python question than lldb question -- but I think your best bet here is to not use the SBValue methods (because your Data pointer has an uninformative type like void *, like I did in my test program), but to use the SBProcess memory read method.

Pointer arithmetic in LLDB Python scripts

Tags:

c++

python

pointers

xcode

lldb

benwad

1 Answers

Jason Molenda

Recent Activity

Donate For Us

Pointer arithmetic in LLDB Python scripts

Tags:

c++

python

pointers

xcode

lldb

benwad

1 Answers

Jason Molenda

Related questions

Recent Activity

Donate For Us