I have a numpy array of size 31x36
and i want to transform into pandas dataframe in order to process it. I am trying to convert it using the following code:
pd.DataFrame(data=matrix,
index=np.array(range(1, 31)),
columns=np.array(range(1, 36)))
However, I am receiving the following error:
ValueError: Shape of passed values is (36, 31), indices imply (35, 30)
How can I solve the issue and transform it properly?
Create a DataFrame from a Numpy ndarray Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.
The Pandas DataFrame Object The next fundamental structure in Pandas is the DataFrame . Like the Series object discussed in the previous section, the DataFrame can be thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary.
To convert a numpy array to pandas dataframe, we use pandas. DataFrame() function of Python Pandas library.
You meet an error because the end
argument in range(start, end)
is non-inclusive. You have a couple of options to account for this:
Just use df = pd.DataFrame(matrix)
. The pd.DataFrame
constructor adds integer indices implicitly.
matrix.shape
gives a tuple of row and column count, so you need not specify them manually. For example:
df = pd.DataFrame(matrix, index=range(matrix.shape[0]),
columns=range(matrix.shape[1]))
If you need to start at 1
, remember to add 1:
df = pd.DataFrame(matrix, index=range(1, matrix.shape[0] + 1),
columns=range(1, matrix.shape[1] + 1))
As to why what you tried failed, the ranges are off by 1
pd.DataFrame(data=matrix,
index=np.array(range(1, 32)),
columns=np.array(range(1, 37)))
As the last value isn't included in the range
Actually looking at what you're doing you could've just done:
pd.DataFrame(data=matrix,
index=np.arange(1, 32)),
columns=np.arange(1, 37)))
Or in pure pandas
:
pd.DataFrame(data=matrix,
index=pd.RangeIndex(range(1, 32)),
columns=pd.RangeIndex(range(1, 37)))
Also if you don't specify the index and column params, an auto-generated index and columns is made, which will start from 0
. Unclear why you need them to start from 1
You could also have not passed the index and column params and just modified them after construction:
In[9]:
df = pd.DataFrame(adaption)
df.columns = df.columns+1
df.index = df.index + 1
df
Out[9]:
1 2 3 4 5 6
1 -2.219072 -1.637188 0.497752 -1.486244 1.702908 0.331697
2 -0.586996 0.040052 1.021568 0.783492 -1.263685 -0.192921
3 -0.605922 0.856685 -0.592779 -0.584826 1.196066 0.724332
4 -0.226160 -0.734373 -0.849138 0.776883 -0.160852 0.403073
5 -0.081573 -1.805827 -0.755215 -0.324553 -0.150827 -0.102148
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With