The functionality I am looking for looks something like this:
data = np.array([[1, 2, 3, 4], [2, 3, 1], [5, 5, 5, 5], [1, 1]]) result = fix(data) print result [[ 1. 2. 3. 4.] [ 2. 3. 1. 0.] [ 5. 5. 5. 5.] [ 1. 1. 0. 0.]]
These data arrays I'm working with are really large so I would really appreciate the most efficient solution.
Edit: Data is read in from disk as a python list of lists.
The zeros() function is used to get a new array of given shape and type, filled with zeros. Shape of the new array, e.g., (2, 3) or 2. The desired data-type for the array, e.g., numpy.
all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.
To initialize your NumPy array with zeros, use the function np. zeros(shape) where shape is a tuple that defines the shape of your desired array. For example, np. zeros((3,)) defines a one-dimensional array with three “0” elements, i.e., [0 0 0] .
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
This could be one approach -
def numpy_fillna(data): # Get lengths of each row of data lens = np.array([len(i) for i in data]) # Mask of valid places in each row mask = np.arange(lens.max()) < lens[:,None] # Setup output array and put elements from data into masked positions out = np.zeros(mask.shape, dtype=data.dtype) out[mask] = np.concatenate(data) return out
Sample input, output -
In [222]: # Input object dtype array ...: data = np.array([[1, 2, 3, 4], ...: [2, 3, 1], ...: [5, 5, 5, 5, 8 ,9 ,5], ...: [1, 1]]) In [223]: numpy_fillna(data) Out[223]: array([[1, 2, 3, 4, 0, 0, 0], [2, 3, 1, 0, 0, 0, 0], [5, 5, 5, 5, 8, 9, 5], [1, 1, 0, 0, 0, 0, 0]], dtype=object)
You could use pandas instead of numpy:
In [1]: import pandas as pd In [2]: df = pd.DataFrame([[1, 2, 3, 4], ...: [2, 3, 1], ...: [5, 5, 5, 5], ...: [1, 1]], dtype=float) In [3]: df.fillna(0.0).values Out[3]: array([[ 1., 2., 3., 4.], [ 2., 3., 1., 0.], [ 5., 5., 5., 5.], [ 1., 1., 0., 0.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With