Python Pandas Data frame creation

Question

I tried to create a data frame df using the below code :

import numpy as np
import pandas as pd
index = [0,1,2,3,4,5]
s = pd.Series([1,2,3,4,5,6],index= index)
t = pd.Series([2,4,6,8,10,12],index= index)
df = pd.DataFrame(s,columns = ["MUL1"])
df["MUL2"] =t

print df


   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12

While trying to create the same data frame using the below syntax, I am getting a wierd output.

df = pd.DataFrame([s,t],columns = ["MUL1","MUL2"])

print df

   MUL1  MUL2
0   NaN   NaN
1   NaN   NaN

Please explain why the NaN is being displayed in the dataframe when both the Series are non empty and why only two rows are getting displayed and no the rest.

Also provide the correct way to create the data frame same as has been mentioned above by using the columns argument in the pandas DataFrame method.

jezrael · Accepted Answer

If remove columns argument get:

df = pd.DataFrame([s,t])

print (df)
   0  1  2  3   4   5
0  1  2  3  4   5   6
1  2  4  6  8  10  12

Then define columns - if columns not exist get NaNs column:

df = pd.DataFrame([s,t], columns=[0,'MUL2'])

print (df)
     0  MUL2
0  1.0   NaN
1  2.0   NaN

Better is use dictionary:

df = pd.DataFrame({'MUL1':s,'MUL2':t})

print (df)
   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12

And if need change columns order add columns parameter:

df = pd.DataFrame({'MUL1':s,'MUL2':t}, columns=['MUL2','MUL1'])

print (df)
   MUL2  MUL1
0     2     1
1     4     2
2     6     3
3     8     4
4    10     5
5    12     6

More information is in dataframe documentation.

Another solution by concat - DataFrame constructor is not necessary:

df = pd.concat([s,t], axis=1, keys=['MUL1','MUL2'])

print (df)
   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12

Divakar · Answer

One of the correct ways would be to stack the array data from the input list holding those series into columns -

In [161]: pd.DataFrame(np.c_[s,t],columns = ["MUL1","MUL2"])
Out[161]: 
   MUL1  MUL2
0     1     2
1     2     4
2     3     6
3     4     8
4     5    10
5     6    12

Behind the scenes, the stacking creates a 2D array, which is then converted to a dataframe. Here's what the stacked array looks like -

In [162]: np.c_[s,t]
Out[162]: 
array([[ 1,  2],
       [ 2,  4],
       [ 3,  6],
       [ 4,  8],
       [ 5, 10],
       [ 6, 12]])

Leo103 · Answer

A pandas.DataFrame takes in the parameter data that can be of type ndarray, iterable, dict, or dataframe.
If you pass in a list it will assume each member is a row. Example:

a = [1,2,3]
b = [2,4,6]

df = pd.DataFrame([a, b], columns = ["Col1","Col2", "Col3"])

# output 1:
   Col1  Col2  Col3
0     1     2     3
1     2     4     6

You are getting NaN because it expects index = [0,1] but you are giving [0,1,2,3,4,5]
To get the shape you want, first transpose the data:

data = np.array([a, b]).transpose()

How to create a pandas dataframe

import pandas as pd

a = [1,2,3]
b = [2,4,6]

df = pd.DataFrame(dict(Col1=a, Col2=b))

Output:

   Col1  Col2
0     1     2
1     2     4
2     3     6

Python Pandas Data frame creation

Tags:

pandas

dataframe

numpy

python-2.7

Sarvagya Dubey

3 Answers

jezrael

Divakar

How to create a pandas dataframe

Leo103

Recent Activity

Donate For Us

Python Pandas Data frame creation

Tags:

pandas

dataframe

numpy

python-2.7

Sarvagya Dubey

3 Answers

jezrael

Divakar

How to create a pandas dataframe

Leo103

Related questions

Recent Activity

Donate For Us