Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas/Numpy Get matrix from column of arrays

Tags:

python

pandas

I have a pandas dataframe with a column of lists.

df:

    inputs
0   [1, 2, 3]
1   [4, 5, 6]
2   [7, 8, 9]
3   [10, 11, 12]

I need the matrix

array([[ 1,  2,  3],
      [ 4,  5,  6],
      [ 7,  8,  9],
      [10, 11, 12]])

An efficient way to do this?

Note: When I try df.inputs.as_matrix() the output is

array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=object)

which has shape (4,), not (4,3) as desired.

like image 488
AS_Butler Avatar asked Feb 05 '17 15:02

AS_Butler


1 Answers

You can convert the column to list and then apply numpy array, if all the lists in the column have the same length, this will make a 2D array:

arr = np.array(df.inputs.tolist())

#array([[ 1,  2,  3],
#       [ 4,  5,  6],
#       [ 7,  8,  9],
#       [10, 11, 12]])

arr.shape
# (4, 3)

Or another option use .values to access the numpy object firstly and then convert it to list as commented by @piRSquared, this is marginally faster with the example given:

%timeit df.inputs.values.tolist()
# 100000 loops, best of 3: 5.52 µs per loop

%timeit df.inputs.tolist()
# 100000 loops, best of 3: 11.5 µs per loop
like image 144
Psidom Avatar answered Dec 06 '22 13:12

Psidom