Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below:

data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]]) 

I'd like the resulting DataFrame to have Row1 and Row2 as index values, and Col1, Col2 as header values

I can specify the index as follows:

df = pd.DataFrame(data,index=data[:,0]), 

however I am unsure how to best assign column headers.

like image 842
user3132783 Avatar asked Dec 24 '13 15:12

user3132783


People also ask

How do I create a specific column as index in pandas?

To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type <code>df. set_index(“Year”)</code>. Now, the set_index() method will return the modified dataframe as a result.

Can we create DataFrame from NumPy arrays?

Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.


1 Answers

You need to specify data, index and columns to DataFrame constructor, as in:

>>> pd.DataFrame(data=data[1:,1:],    # values ...              index=data[1:,0],    # 1st column as index ...              columns=data[0,1:])  # 1st row as the column names 

edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.

like image 100
behzad.nouri Avatar answered Sep 22 '22 08:09

behzad.nouri