Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python String memory usage on FreeBSD

I'm observing a strange memory usage pattern with python strings on Freebsd. Consider the following session. Idea is to create a list which holds some strings so that cumulative characters in the list is 100MB.

l = []
for i in xrange(100000):
    l.append(str(i) * (1000/len(str(i))))

This uses around 100MB of memory as expected and 'del l' will clear that.

l = []
for i in xrange(20000):
    l.append(str(i) * (5000/len(str(i))))

This is using 165MB of memory. I really don't understand where the additional memory usage is coming from. [Size of both lists are same]

Python 2.6.4 on FreeBSD 7.2. On Linux/windows both uses around 100mb memory only.

Update: I'm measuring memory using 'ps aux'. That can be executed using os.sytem after above code snippets. Also These were executed separately.

Update2: Looks like freebsd mallocs memory in multiples of 2. So allocating 5KB actually allocates 8KB. I'm not sure though.

like image 601
amit Avatar asked Mar 17 '11 17:03

amit


1 Answers

In my opinion, that would probably be fragments in memory. First of all, memory chunks which are bigger than 255 bytes will be allocated with malloc in CPython. You can reference to

Improving Python's Memory Allocator

For performance reason, most of memory allocation, like malloc, will return a aligned address. For example, you will never get a address like

0x00003

It is not aligned by 4 bytes, it would be very slow for computer to access the memory. Therefore, all address you get by malloc should be

0x00000
0x00004
0x00008

and so on. The 4 bytes alignment is only the basic common rule, real policy of alignment would be OS variant.

And the memory usage you are talking about should be RSS (not sure). For most of OS, page size of virtual memory is 4K. For what you allocate, you need 2 page for storing a 5000 byte chunk. Let's see an example for illustrating some memory leak. We assume the alignment is by 256 bytes here.

0x00000 {
...       chunk 1
0x01388 }
0x01389 {
...       fragment 1
0x013FF }
0x01400 {
...       chunk 2
0x02788 }
0x02789 {
...       fragment 2
0x027FF }
0x02800 {
...       chunk 3
0x03B88 }
0x03B89 {
...       fragment 3
0x04000 }

As you can see there are so many fragments in the memory, they can't be used, but still, they occupy the memory space of a page. I'm not sure what is the alignment policy of FreeBSD, but I think it is caused by reason like this. For using memory efficiently with Python, you can use a big chunk of pre-allocated bytearray, and pick a good number as the chunk to use (You have to test to know which number is best, it depends on OS).

like image 145
Fang-Pen Lin Avatar answered Oct 03 '22 06:10

Fang-Pen Lin