Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting memsize of shared Array space

tl;dr

require 'objspace'

ObjectSpace.memsize_of([0] * 1_000_000)
#=> 8000040
ObjectSpace.memsize_of(Array.new([0] * 1_000_000))
#=> 40

Where did it go?

Longer version

A whole bunch of stuff inside Array seems to have a concept of a "shared array" where the data block gets moved to a shared heap space. I'm aware that memsize_of makes it clear that it may be incomplete, but is there a (good?) way to analyze the allocation of these shared array blocks? They don't seem to be "objects" from the point of view of ObjectSpace.each_object. For the purposes of this memory profiler it would be nice to at least be able to track the overall size of the shared array heap space even if I can't trace it back to specific objects.

like image 491
coderanger Avatar asked Sep 30 '16 04:09

coderanger


People also ask

How to merge two sorted arrays with O (1) extra space?

Efficiently merging two sorted arrays with O (1) extra space. Given two sorted arrays, we need to merge them in O ( (n+m)*log (n+m)) time with O (1) extra space into a sorted array, when n is the size of the first array, and m is the size of the second array.

How to find the memory size of a NumPy array?

In this post, we will see how to find the memory size of a NumPy array. So for finding the memory size we are using following methods: Method 1: Using size and itemsize attributes of NumPy array. size: This attribute gives the number of elements present in the NumPy array.

Why are my array elements too close to each other?

Because of the fractions in the array cells, array elements are vertically too close to each other, they are literally touching to each other. Also the horizontal spacing is too much as you see in the image (screen shot from the output PDF file).

How to traverse two arrays simultaneously in JavaScript?

The idea is to traverse both arrays from starting simultaneously. Let’s say an element in a is a [i] and in b is b [j] and k is the position at where the next minimum number will come. Now update value a [k] if k<n else b [k-n] by adding min (a [i],b [j])*maximum_element.


1 Answers

Fortunately rb_ary_memsize is a public function, so with small hack, you can do it:

#include <ruby.h>
#include <assert.h>

/* private macros from array.c */
#define ARY_OWNS_HEAP_P(a) (!FL_TEST((a), ELTS_SHARED|RARRAY_EMBED_FLAG))
#define ARY_SHARED_P(ary) \
    (assert(!FL_TEST((ary), ELTS_SHARED) || !FL_TEST((ary), RARRAY_EMBED_FLAG)), \
     FL_TEST((ary),ELTS_SHARED)!=0)

RUBY_FUNC_EXPORTED size_t
rb_ary_memsize(VALUE ary)
{
    if (ARY_OWNS_HEAP_P(ary)) {
        return RARRAY(ary)->as.heap.aux.capa * sizeof(VALUE);
    }
/* -------8<------8<------- */
    else if (ARY_SHARED_P(ary)){
        /* if it is a shared array, calculate size using length of shared root */
        return RARRAY_LEN(RARRAY(ary)->as.heap.aux.shared) * sizeof(VALUE);
    }
/* ------->8------>8------- */
    else {
        return 0;
    }
}

Compile it into shared object:

gcc $(ruby -rrbconfig \
  -e'puts RbConfig::CONFIG.values_at("rubyhdrdir","rubyarchhdrdir").map{|d| " -I#{d}"}.join') \
  -Wall -fpic -shared -o ary_memsize_hack.so ary_memsize_hack.c

And load into process replacing original function:

LD_PRELOAD="$(pwd)/ary_memsize_hack.so" ruby -robjspace \
  -e 'p ObjectSpace.memsize_of([0] * 1_000_000); 
      p ObjectSpace.memsize_of(Array.new([0] * 1_000_000))'

It will produce desired output:

8000040
8000040

UPDATE: rb_ary_memsize function which in charge of estimating array size, only does it for arrays, which are owning the heap (i.e. not shared and not embedded), and returns zero otherwise. In general it makes sense, because if you supposed to calculate size of all arrays in the applications, eventually the numbers should match, while with my patch the contents of shared arrays will be counted multiple times. I guess main problem is the way how the wrapping array constructed on the ruby side: essentially the reference on inner array lost, and is not reachable by the application code, and looks like uncountable. My patch only demonstrates how to reach the root of the shared array to expose the size, but I don't think this should be integrated into upstream in any way. The similar problem would be with embedded arrays, for the ruby also returns 0 as the size, which does not show what the application expect to see:

require 'objspace'

puts ObjectSpace.dump([1])
#=> {"address":"0x000008033f9bd8", "type":"ARRAY", "class":"0x000008029f9a98", "length":1, 
#    "embedded":true, "memsize":40, "flags":{"wb_protected":true}}
puts ObjectSpace.dump([1, 2])
#=> {"address":"0x000008033f9b38", "type":"ARRAY", "class":"0x000008029f9a98", "length":2, 
#    "embedded":true, "memsize":40, "flags":{"wb_protected":true}}
puts ObjectSpace.dump([1, 2, 3])
#=> {"address":"0x000008033f9ac0", "type":"ARRAY", "class":"0x000008029f9a98", "length":3, 
#    "embedded":true, "memsize":40, "flags":{"wb_protected":true}}
puts ObjectSpace.dump([1, 2, 3, 4])
#=> {"address":"0x000008033f9a48", "type":"ARRAY", "class":"0x000008029f9a98", "length":4,
#    "memsize":72, "flags":{"wb_protected":true}}
like image 65
avsej Avatar answered Oct 20 '22 01:10

avsej