Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy -- Transform non-contiguous data to contiguous data in place

Tags:

python

numpy

Consider the following code:

import numpy as np
a = np.zeros(50)
a[10:20:2] = 1
b = c = a[10:40:4]
print b.flags  # You'll see that b and c are not C_CONTIGUOUS or F_CONTIGUOUS

My question:

Is there a way (with only a reference to b) to make both b and c contiguous? It is completely fine if np.may_share_memory(b,a) returns False after this operation.

Things which are close, but don't quite work out are: np.ascontiguousarray/np.asfortranarray as they will return a new array.


My use case is that I have very large 3D fields stored in a subclass of a numpy.ndarray. In order to save memory, I would like to chop those fields down to the portion of the domain that I am actually interested in processing:

a = a[ix1:ix2,iy1:iy2,iz1:iz2]

Slicing for the subclass is somewhat more restricted than slicing of ndarray objects, but this should work and it will "do the right thing" -- the various custom meta-data attached on the subclass will be transformed/preserved as expected. Unfortunately, since this returns a view, numpy won't free the big array afterward so I don't actually save any memory here.

To be completely clear, I'm looking to accomplish 2 things:

  • preserve the metadata on my class instance. slicing will work, but I'm not sure about other forms of copying.
  • make it so that the original array is free to be garbage collected
like image 587
mgilson Avatar asked Mar 15 '13 00:03

mgilson


People also ask

How do you make an array contiguous?

Contiguous Array in C++ Suppose we have a binary array, we have to find the maximum length of a contiguous subarray with equal number of 0 and 1. So if the input is like [0,1,0], then the output will be 2 as [0,1] or [1,0] is the largest contiguous array with equal number of 0s and 1s.

Is NumPy array contiguous?

ascontiguousarray() function is used to return a contiguous array where the dimension of the array is greater or equal to 1 and stored in memory (C order). Note: A contiguous array is stored in an unbroken block of memory. To access the subsequent value in the array, we move to the next memory address.

What is arange function in NumPy?

NumPy arange() is one of the array creation routines based on numerical ranges. It creates an instance of ndarray with evenly spaced values and returns the reference to it. You can define the interval of the values contained in an array, space between them, and their type with four parameters of arange() : numpy.

What is contiguous flattened array?

The contiguous flattened array is a two-dimensional and multi-dimensional array that is stored as a one-dimensional array. We will be using the ravel() method to perform this task. Syntax : numpy.ravel(array, order = 'C') Parameters : array : Input array.


1 Answers

According to Alex Martelli:

"The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates."

However, the following appears to free at least some of the memory: Warning: my way of measuring free memory is Linux-specific:

import time
import numpy as np

def free_memory():
    """
    Return free memory available, including buffer and cached memory
    """
    total = 0
    with open('/proc/meminfo', 'r') as f:
        for line in f:
            line = line.strip()
            if any(line.startswith(field) for field in ('MemFree', 'Buffers', 'Cached')):
                field, amount, unit = line.split()
                amount = int(amount)
                if unit != 'kB':
                    raise ValueError(
                        'Unknown unit {u!r} in /proc/meminfo'.format(u=unit))
                total += amount
    return total

def gen_change_in_memory():
    """
    https://stackoverflow.com/a/14446011/190597 (unutbu)
    """
    f = free_memory()
    diff = 0
    while True:
        yield diff
        f2 = free_memory()
        diff = f - f2
        f = f2
change_in_memory = gen_change_in_memory().next

Before allocating the large array:

print(change_in_memory())
# 0

a = np.zeros(500000)
a[10:20:2] = 1
b = c = a[10:40:4]

After allocating the large array:

print(change_in_memory())
# 3844 # KiB

a[:len(b)] = b
b = a[:len(b)]
a.resize(len(b), refcheck=0)
time.sleep(1)

Free memory increases after resizing:

print(change_in_memory())
# -3708 # KiB
like image 112
unutbu Avatar answered Sep 19 '22 12:09

unutbu