Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ctypes in python crashes with memset

I am trying to erase password string from memory like it is suggested in here.

I wrote that little snippet:

import ctypes, sys

def zerome(string):
    location = id(string) + 20
    size     = sys.getsizeof(string) - 20
    #memset =  ctypes.cdll.msvcrt.memset
    # For Linux, use the following. Change the 6 to whatever it is on your computer.
    print ctypes.string_at(location, size)
    memset =  ctypes.CDLL("libc.so.6").memset
    memset(location, 0, size)
    print "Clearing 0x%08x size %i bytes" % (location, size)
    print ctypes.string_at(location, size)

a = "asdasd"

zerome(a)

Oddly enouth this code works fine with IPython,

[7] oz123@yenitiny:~ $ ipython a.py 
Clearing 0x02275b84 size 23 bytes

But crashes with Python:

[8] oz123@yenitiny:~ $ python a.py 
Segmentation fault
[9] oz123@yenitiny:~ $

Any ideas why?

I tested on Debian Wheezy, with Python 2.7.3.

little update ...

The code works on CentOS 6.2 with Python 2.6.6. The code crashed on Debian with Python 2.6.8. I tried thinking why it works on CentOS, and not on Debian. The only reason, which came an immidiate different, is that my Debian is multiarch and CentOS is running on my older laptop with i686 CPU.

Hence, I rebooted my CentOS latop and loaded Debian Wheezy on it. The code works on Debian Wheezy which is not multi-arch. Hence, I suspect my configuration on Debian is somewhat problematic ...

like image 511
oz123 Avatar asked Mar 23 '13 00:03

oz123


1 Answers

ctypes has a memset function already, so you don't have to make a function pointer for the libc/msvcrt function. Also, 20 bytes is for common 32-bit platforms. On 64-bit systems it's probably 36 bytes. Here's the layout of a PyStringObject:

typedef struct {
    Py_ssize_t ob_refcnt;         // 4|8 bytes
    struct _typeobject *ob_type;  // 4|8 bytes
    Py_ssize_t ob_size;           // 4|8 bytes
    long ob_shash;                // 4|8 bytes (4 on 64-bit Windows)
    int ob_sstate;                // 4 bytes
    char ob_sval[1];
} PyStringObject; 

So it could be 5*4 = 20 bytes on a 32-bit system, 8*4 + 4 = 36 bytes on 64-bit Linux, or 8*3 + 4*2 = 32 bytes on 64-bit Windows. Since a string isn't tracked with a garbage collection header, you can use sys.getsizeof. In general if you don't want the GC header size included (in memory it's actually before the object's base address you get from id), then use the object's __sizeof__ method. At least that's a general rule in my experience.

What you want is to simply subtract the buffer size from the object size. The string in CPython is null-terminated, so simply add 1 to its length to get the buffer size. For example:

>>> a = 'abcdef'
>>> bufsize = len(a) + 1
>>> offset = sys.getsizeof(a) - bufsize
>>> ctypes.memset(id(a) + offset, 0, bufsize)
3074822964L
>>> a
'\x00\x00\x00\x00\x00\x00'

Edit

A better alternative is to define the PyStringObject structure. This makes it convenient to check ob_sstate. If it's greater than 0, that means the string is interned and the sane thing to do is raise an exception. Single-character strings are interned, along with string constants in code objects that consist of only ASCII letters and underscore, and also strings used internally by the interpreter for names (variable names, attributes).

from ctypes import *

class PyStringObject(Structure):
    _fields_ = [
      ('ob_refcnt', c_ssize_t),
      ('ob_type', py_object),
      ('ob_size', c_ssize_t),
      ('ob_shash', c_long),
      ('ob_sstate', c_int),
      # ob_sval varies in size
      # zero with memset is simpler
    ]

def zerostr(s):
    """zero a non-interned string"""
    if not isinstance(s, str):
        raise TypeError(
          "expected str object, not %s" % type(s).__name__)

    s_obj = PyStringObject.from_address(id(s))
    if s_obj.ob_sstate > 0:
        raise RuntimeError("cannot zero interned string")

    s_obj.ob_shash = -1  # not hashed yet
    offset = sizeof(PyStringObject)
    memset(id(s) + offset, 0, len(s))

For example:

>>> s = 'abcd' # interned by code object
>>> zerostr(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 10, in zerostr
RuntimeError: cannot zero interned string

>>> s = raw_input() # not interned
abcd
>>> zerostr(s)
>>> s
'\x00\x00\x00\x00'
like image 91
Eryk Sun Avatar answered Nov 14 '22 23:11

Eryk Sun