Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: Fix array with rows of different lengths by filling the empty elements with zeros

The functionality I am looking for looks something like this:

data = np.array([[1, 2, 3, 4],                  [2, 3, 1],                  [5, 5, 5, 5],                  [1, 1]])  result = fix(data) print result  [[ 1.  2.  3.  4.]  [ 2.  3.  1.  0.]  [ 5.  5.  5.  5.]  [ 1.  1.  0.  0.]] 

These data arrays I'm working with are really large so I would really appreciate the most efficient solution.

Edit: Data is read in from disk as a python list of lists.

like image 768
user2909415 Avatar asked Aug 16 '15 17:08

user2909415


People also ask

What is the use of the zeros () function in NumPy array?

The zeros() function is used to get a new array of given shape and type, filled with zeros. Shape of the new array, e.g., (2, 3) or 2. The desired data-type for the array, e.g., numpy.

What does .all do in NumPy?

all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.

How do I create a NumPy array of zeros and ones?

To initialize your NumPy array with zeros, use the function np. zeros(shape) where shape is a tuple that defines the shape of your desired array. For example, np. zeros((3,)) defines a one-dimensional array with three “0” elements, i.e., [0 0 0] .

Does NumPy array have fixed size?

NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.


2 Answers

This could be one approach -

def numpy_fillna(data):     # Get lengths of each row of data     lens = np.array([len(i) for i in data])      # Mask of valid places in each row     mask = np.arange(lens.max()) < lens[:,None]      # Setup output array and put elements from data into masked positions     out = np.zeros(mask.shape, dtype=data.dtype)     out[mask] = np.concatenate(data)     return out 

Sample input, output -

In [222]: # Input object dtype array      ...: data = np.array([[1, 2, 3, 4],      ...:                  [2, 3, 1],      ...:                  [5, 5, 5, 5, 8 ,9 ,5],      ...:                  [1, 1]])  In [223]: numpy_fillna(data) Out[223]:  array([[1, 2, 3, 4, 0, 0, 0],        [2, 3, 1, 0, 0, 0, 0],        [5, 5, 5, 5, 8, 9, 5],        [1, 1, 0, 0, 0, 0, 0]], dtype=object) 
like image 192
Divakar Avatar answered Sep 29 '22 06:09

Divakar


You could use pandas instead of numpy:

In [1]: import pandas as pd  In [2]: df = pd.DataFrame([[1, 2, 3, 4],    ...:                    [2, 3, 1],    ...:                    [5, 5, 5, 5],    ...:                    [1, 1]], dtype=float)   In [3]: df.fillna(0.0).values Out[3]:  array([[ 1.,  2.,  3.,  4.],        [ 2.,  3.,  1.,  0.],        [ 5.,  5.,  5.,  5.],        [ 1.,  1.,  0.,  0.]]) 
like image 38
Eastsun Avatar answered Sep 29 '22 08:09

Eastsun