I have a 2D Numpy array that I would like to put in a pandas Series (not a DataFrame):
>>> import pandas as pd
>>> import numpy as np
>>> a = np.zeros((5, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
But this throws an error:
>>> s = pd.Series(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 227, in __init__
raise_cast_failure=True)
File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 2920, in _sanitize_array
raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional
It is possible with a hack:
>>> s = pd.Series(map(lambda x:[x], a)).apply(lambda x:x[0])
>>> s
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
Is there a better way?
A pandas Series is very similar to a 1-dimensional NumPy array, and we can create a pandas Series by using a NumPy array. To do this we need to import the NumPy module, as it is a prerequisite for the pandas package no need to install it separately.
How do you convert an array to a DataFrame in Python? To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) .
A NumPy array can be converted into a Pandas series by passing it in the pandas. Series() function.
A NumPy array can be converted into a Pandas series by passing it in the pandas.Series () function. Example 1 : import numpy as np. import pandas as pd. array = np.array ( [10, 20, 1, 2, 3, 4, 5, 6, 7]) print("Numpy array is :") display (array)
Let’s see how to create a Pandas Series from the array. Method #1: Create a series from array without index. In this case as no index is passed, so by default index will be range (n) where n is array length.
Write a Pandas program to convert a given Series to an array. Sample Solution: Python Code : import pandas as pd import numpy as np s1 = pd.Series(['100', '200', 'python', '300.12', '400']) print("Original Data Series:") print(s1) print("Series to an array") a = np.array(s1.values.tolist()) print (a) Sample Output:
Pandas Series is a one-dimensional labelled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It has to be remembered that unlike Python lists, a Series will always contain data of the same type. Let’s see how to create a Pandas Series from the array.
Well, you can use the numpy.ndarray.tolist
function, like so:
>>> a = np.zeros((5,2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a.tolist()
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
>>> pd.Series(a.tolist())
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
dtype: object
EDIT:
A faster way to accomplish a similar result is to simply do pd.Series(list(a))
. This will make a Series of numpy arrays instead of Python lists, so should be faster than a.tolist
which returns a list of Python lists.
pd.Series(list(a))
is consistently slower than
pd.Series(a.tolist())
tested 20,000,000 -- 500,000 rows
a = np.ones((500000,2))
showing only 1,000,000 rows:
%timeit pd.Series(list(a))
1 loop, best of 3: 301 ms per loop
%timeit pd.Series(a.tolist())
1 loop, best of 3: 261 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With