Convert numpy array to pandas dataframe

Tags:

I have a numpy array of size 31x36 and i want to transform into pandas dataframe in order to process it. I am trying to convert it using the following code:

pd.DataFrame(data=matrix,
          index=np.array(range(1, 31)),
          columns=np.array(range(1, 36)))

However, I am receiving the following error:

ValueError: Shape of passed values is (36, 31), indices imply (35, 30)

How can I solve the issue and transform it properly?

732

asked May 31 '18 12:05

konstantin

2 Answers

You meet an error because the end argument in range(start, end) is non-inclusive. You have a couple of options to account for this:

Don't pass index and columns

Just use df = pd.DataFrame(matrix). The pd.DataFrame constructor adds integer indices implicitly.

Pass in the shape of the array

matrix.shape gives a tuple of row and column count, so you need not specify them manually. For example:

df = pd.DataFrame(matrix, index=range(matrix.shape[0]),
                          columns=range(matrix.shape[1]))

If you need to start at 1, remember to add 1:

df = pd.DataFrame(matrix, index=range(1, matrix.shape[0] + 1),
                          columns=range(1, matrix.shape[1] + 1))

180

answered Oct 02 '22 06:10

jpp

As to why what you tried failed, the ranges are off by 1

pd.DataFrame(data=matrix,
          index=np.array(range(1, 32)),
          columns=np.array(range(1, 37)))

As the last value isn't included in the range

Actually looking at what you're doing you could've just done:

pd.DataFrame(data=matrix,
          index=np.arange(1, 32)),
          columns=np.arange(1, 37)))

Or in pure pandas:

pd.DataFrame(data=matrix,
          index=pd.RangeIndex(range(1, 32)),
          columns=pd.RangeIndex(range(1, 37)))

Also if you don't specify the index and column params, an auto-generated index and columns is made, which will start from 0. Unclear why you need them to start from 1

You could also have not passed the index and column params and just modified them after construction:

In[9]:
df = pd.DataFrame(adaption)
df.columns = df.columns+1
df.index = df.index + 1
df

Out[9]: 
          1         2         3         4         5         6
1 -2.219072 -1.637188  0.497752 -1.486244  1.702908  0.331697
2 -0.586996  0.040052  1.021568  0.783492 -1.263685 -0.192921
3 -0.605922  0.856685 -0.592779 -0.584826  1.196066  0.724332
4 -0.226160 -0.734373 -0.849138  0.776883 -0.160852  0.403073
5 -0.081573 -1.805827 -0.755215 -0.324553 -0.150827 -0.102148

answered Oct 02 '22 07:10

EdChum

Related questions
                            
                                Scope of caught exception instance in Python 2 and 3
                            
                                Database "is being accessed by other users" error when using ThreadPoolExecutor with Django
                            
                                Outer merging two data frames in place in pandas
                            
                                Workaround for using __name__=='__main__' in Python multiprocessing
                            
                                Save and load two ML models in pyspark
                            
                                Error when creating a custom response message
                            
                                How to use TensorFlow metrics in Keras
                            
                                python cx_oracle cursor.rowcount returning 0 but cursor.fetchall returns data
                            
                                Unsupported hash type error while installing hashlib using pip3
                            
                                python importlib no module named
                            
                                How could I add a column to a DataFrame in Pyspark with incremental values?
                            
                                How to indicate multiple unused values in Python?
                            
                                "Merging" numpy arrays together with a common dimension [duplicate]
                            
                                Creating a Bigquery table by Python API
                            
                                TensorFlow Eager Mode: How to restore a model from a checkpoint?
                            
                                Pandas merge TypeError: object of type 'NoneType' has no len()
                            
                                Treat an emoji as one character in a regex [duplicate]
                            
                                Best way to subset a pandas dataframe [closed]
                            
                                Is line continuation with backslash dangerous in Python?
                            
                                How to achieve stratified K fold splitting for arbitrary number of categorical variables?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert numpy array to pandas dataframe

Tags:

python

pandas

numpy