Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a pandas DataFrame from columns of other DataFrames with similar indexes

I have 2 DataFrames df1 and df2 with the same column names ['a','b','c'] and indexed by dates. The date index can have similar values. I would like to create a DataFrame df3 with only the data from columns ['c'] renamed respectively 'df1' and 'df2' and with the correct date index. My problem is that I cannot get how to merge the index properly.

df1 = pd.DataFrame(np.random.randn(5,3), index=pd.date_range('01/02/2014',periods=5,freq='D'), columns=['a','b','c'] ) df2 = pd.DataFrame(np.random.randn(8,3), index=pd.date_range('01/01/2014',periods=8,freq='D'), columns=['a','b','c'] ) df1                  a        b            c 2014-01-02   0.580550    0.480814    1.135899 2014-01-03  -1.961033    0.546013    1.093204 2014-01-04   2.063441   -0.627297    2.035373 2014-01-05   0.319570    0.058588    0.350060 2014-01-06   1.318068   -0.802209   -0.939962  df2                  a        b            c 2014-01-01   0.772482    0.899337    0.808630 2014-01-02   0.518431   -1.582113    0.323425 2014-01-03   0.112109    1.056705   -1.355067 2014-01-04   0.767257   -2.311014    0.340701 2014-01-05   0.794281   -1.954858    0.200922 2014-01-06   0.156088    0.718658   -1.030077 2014-01-07   1.621059    0.106656   -0.472080 2014-01-08  -2.061138   -2.023157    0.257151 

The df3 DataFrame should have the following form :

df3                  df1        df2 2014-01-01   NaN        0.808630 2014-01-02   1.135899   0.323425 2014-01-03   1.093204   -1.355067 2014-01-04   2.035373   0.340701 2014-01-05   0.350060   0.200922 2014-01-06   -0.939962  -1.030077 2014-01-07   NaN        -0.472080 2014-01-08   NaN        0.257151 

But with NaN in the df1 column as the date index of df2 is wider. (In this example, I would get NaN for the ollowing dates : 2014-01-01, 2014-01-07 and 2014-01-08)

Thanks for your help.

like image 312
user3153467 Avatar asked Jan 20 '14 10:01

user3153467


1 Answers

You can use concat:

In [11]: pd.concat([df1['c'], df2['c']], axis=1, keys=['df1', 'df2']) Out[11]:                   df1       df2 2014-01-01       NaN -0.978535 2014-01-02 -0.106510 -0.519239 2014-01-03 -0.846100 -0.313153 2014-01-04 -0.014253 -1.040702 2014-01-05  0.315156 -0.329967 2014-01-06 -0.510577 -0.940901 2014-01-07       NaN -0.024608 2014-01-08       NaN -1.791899  [8 rows x 2 columns] 

The axis argument determines the way the DataFrames are stacked:

df1 = pd.DataFrame([1, 2, 3]) df2 = pd.DataFrame(['a', 'b', 'c'])  pd.concat([df1, df2], axis=0)    0 0  1 1  2 2  3 0  a 1  b 2  c  pd.concat([df1, df2], axis=1)     0  0 0  1  a 1  2  b 2  3  c 
like image 102
Andy Hayden Avatar answered Oct 05 '22 02:10

Andy Hayden