Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas series to 2d array

Tags:

python

pandas

So, I used the answer from Put a 2d Array into a Pandas Series to put 2D numpy array to pandas series. In short, it is

a = np.zeros((5,2))
s = pd.Series(list(a))

Now, what is the cheapest way to convert that pandas Series back to 2D array? If I try s.values, I get array of arrays with object dtype.

So far I tried np.vstack(s.values) but it copies the data, of course.

like image 301
crayxt Avatar asked Feb 04 '23 01:02

crayxt


1 Answers

I believe you need:

a = np.array(s.values.tolist())
print (a)
[[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]

a = np.zeros((50000,2))
s = pd.Series(list(a))

In [131]: %timeit (np.vstack(s.values))
10 loops, best of 3: 107 ms per loop

In [132]: %timeit (np.array(s.values.tolist()))
10 loops, best of 3: 19.7 ms per loop

In [133]: %timeit (np.array(s.tolist()))
100 loops, best of 3: 19.6 ms per loop

But if transpose difference is small (but caching):

a = np.zeros((2,50000))
s = pd.Series(list(a))
#print (s)

In [159]: %timeit (np.vstack(s.values))
The slowest run took 23.31 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 55.7 µs per loop

In [160]: %timeit (np.array(s.values.tolist()))
The slowest run took 7.20 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 49.8 µs per loop

In [161]: %timeit (np.array(s.tolist()))
The slowest run took 7.31 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 62.6 µs per loop
like image 122
jezrael Avatar answered Feb 06 '23 16:02

jezrael