Convert Python sequence to NumPy array, filling missing values

Tags:

The implicit conversion of a Python sequence of variable-length lists into a NumPy array cause the array to be of type object.

v = [[1], [1, 2]] np.array(v) >>> array([[1], [1, 2]], dtype=object)

Trying to force another type will cause an exception:

np.array(v, dtype=np.int32) ValueError: setting an array element with a sequence.

What is the most efficient way to get a dense NumPy array of type int32, by filling the "missing" values with a given placeholder?

From my sample sequence v, I would like to get something like this, if 0 is the placeholder

array([[1, 0], [1, 2]], dtype=int32)

780

asked Jul 27 '16 17:07

2 Answers

You can use itertools.zip_longest:

import itertools np.array(list(itertools.zip_longest(*v, fillvalue=0))).T Out:  array([[1, 0],        [1, 2]])

Note: For Python 2, it is itertools.izip_longest.

173

answered Sep 28 '22 04:09

ayhan

Here's an almost* vectorized boolean-indexing based approach that I have used in several other posts -

def boolean_indexing(v):     lens = np.array([len(item) for item in v])     mask = lens[:,None] > np.arange(lens.max())     out = np.zeros(mask.shape,dtype=int)     out[mask] = np.concatenate(v)     return out

Sample run

In [27]: v Out[27]: [[1], [1, 2], [3, 6, 7, 8, 9], [4]]  In [28]: out Out[28]:  array([[1, 0, 0, 0, 0],        [1, 2, 0, 0, 0],        [3, 6, 7, 8, 9],        [4, 0, 0, 0, 0]])

*Please note that this coined as almost vectorized because the only looping performed here is at the start, where we are getting the lengths of the list elements. But that part not being so computationally demanding should have minimal effect on the total runtime.

Runtime test

In this section I am timing DataFrame-based solution by @Alberto Garcia-Raboso, itertools-based solution by @ayhan as they seem to scale well and the boolean-indexing based one from this post for a relatively larger dataset with three levels of size variation across the list elements.

Case #1 : Larger size variation

In [44]: v = [[1], [1,2,4,8,4],[6,7,3,6,7,8,9,3,6,4,8,3,2,4,5,6,6,8,7,9,3,6,4]]  In [45]: v = v*1000  In [46]: %timeit pd.DataFrame(v).fillna(0).values.astype(np.int32) 100 loops, best of 3: 9.82 ms per loop  In [47]: %timeit np.array(list(itertools.izip_longest(*v, fillvalue=0))).T 100 loops, best of 3: 5.11 ms per loop  In [48]: %timeit boolean_indexing(v) 100 loops, best of 3: 6.88 ms per loop

Case #2 : Lesser size variation

In [49]: v = [[1], [1,2,4,8,4],[6,7,3,6,7,8]]  In [50]: v = v*1000  In [51]: %timeit pd.DataFrame(v).fillna(0).values.astype(np.int32) 100 loops, best of 3: 3.12 ms per loop  In [52]: %timeit np.array(list(itertools.izip_longest(*v, fillvalue=0))).T 1000 loops, best of 3: 1.55 ms per loop  In [53]: %timeit boolean_indexing(v) 100 loops, best of 3: 5 ms per loop

Case #3 : Larger number of elements (100 max) per list element

In [139]: # Setup inputs      ...: N = 10000 # Number of elems in list      ...: maxn = 100 # Max. size of a list element      ...: lens = np.random.randint(0,maxn,(N))      ...: v = [list(np.random.randint(0,9,(L))) for L in lens]      ...:   In [140]: %timeit pd.DataFrame(v).fillna(0).values.astype(np.int32) 1 loops, best of 3: 292 ms per loop  In [141]: %timeit np.array(list(itertools.izip_longest(*v, fillvalue=0))).T 1 loops, best of 3: 264 ms per loop  In [142]: %timeit boolean_indexing(v) 10 loops, best of 3: 95.7 ms per loop

To me, it seems ~~itertools.izip_longest is doing pretty well!~~ there's no clear winner, but would have to be taken on a case-by-case basis!

answered Sep 28 '22 03:09

Divakar

Related questions
                            
                                Pillow: libopenjp2.so.7: cannot open shared object file: No such file or directory
                            
                                Python: Removing Rows on Count condition
                            
                                Overriding initial value in ModelForm
                            
                                How do I index the 3 highest values in a list?
                            
                                how to convert from longitude and latitude to country or city?
                            
                                How to check if len is valid
                            
                                C++ equivalent of Python String Slice?
                            
                                Extract value from single row of pandas DataFrame
                            
                                MS Access library for python [duplicate]
                            
                                Detecting a US Holiday
                            
                                How to determine what user and group a Python script is running as?
                            
                                Best way to do a "not None" test in Python for a normal and Unicode empty string?
                            
                                Merging sublists in python [duplicate]
                            
                                OverflowError: (34, 'Result too large')
                            
                                python bind socket.error: [Errno 13] Permission denied
                            
                                How to mock psycopg2 cursor object?
                            
                                combining two string variables [duplicate]
                            
                                What's causing 'unable to connect to data source' for pyodbc?
                            
                                python regex: get end digits from a string
                            
                                Output to the same line overwriting previous

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert Python sequence to NumPy array, filling missing values

Tags:

python

arrays

numpy

sequence

variable-length-array

Marco Ancona

People also ask

2 Answers

ayhan

Divakar

Recent Activity

Donate For Us