Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas concat: ValueError: Shape of passed values is blah, indices imply blah2

Tags:

python

pandas

I'm trying to merge a (Pandas 14.1) dataframe and a series. The series should form a new column, with some NAs (since the index values of the series are a subset of the index values of the dataframe).

This works for a toy example, but not with my data (detailed below).

Example:

import pandas as pd import numpy as np  df1 = pd.DataFrame(np.random.randn(6, 4), columns=['A', 'B', 'C', 'D'], index=pd.date_range('1/1/2011', periods=6, freq='D')) df1  A   B   C   D 2011-01-01  -0.487926   0.439190    0.194810    0.333896 2011-01-02  1.708024    0.237587    -0.958100   1.418285 2011-01-03  -1.228805   1.266068    -1.755050   -1.476395 2011-01-04  -0.554705   1.342504    0.245934    0.955521 2011-01-05  -0.351260   -0.798270   0.820535    -0.597322 2011-01-06  0.132924    0.501027    -1.139487   1.107873  s1 = pd.Series(np.random.randn(3), name='foo', index=pd.date_range('1/1/2011', periods=3, freq='2D')) s1  2011-01-01   -1.660578 2011-01-03   -0.209688 2011-01-05    0.546146 Freq: 2D, Name: foo, dtype: float64  pd.concat([df1, s1],axis=1)  A   B   C   D   foo 2011-01-01  -0.487926   0.439190    0.194810    0.333896    -1.660578 2011-01-02  1.708024    0.237587    -0.958100   1.418285    NaN 2011-01-03  -1.228805   1.266068    -1.755050   -1.476395   -0.209688 2011-01-04  -0.554705   1.342504    0.245934    0.955521    NaN 2011-01-05  -0.351260   -0.798270   0.820535    -0.597322   0.546146 2011-01-06  0.132924    0.501027    -1.139487   1.107873    NaN 

The situation with the data (see below) seems basically identical - concatting a series with a DatetimeIndex whose values are a subset of the dataframe's. But it gives the ValueError in the title (blah1 = (5, 286) blah2 = (5, 276) ). Why doesn't it work?:

In[187]: df.head() Out[188]: high    low loc_h   loc_l time                 2014-01-01 17:00:00 1.376235    1.375945    1.376235    1.375945 2014-01-01 17:01:00 1.376005    1.375775    NaN NaN 2014-01-01 17:02:00 1.375795    1.375445    NaN 1.375445 2014-01-01 17:03:00 1.375625    1.375515    NaN NaN 2014-01-01 17:04:00 1.375585    1.375585    NaN NaN In [186]: df.index Out[186]: <class 'pandas.tseries.index.DatetimeIndex'> [2014-01-01 17:00:00, ..., 2014-01-01 21:30:00] Length: 271, Freq: None, Timezone: None  In [189]: hl.head() Out[189]: 2014-01-01 17:00:00    1.376090 2014-01-01 17:02:00    1.375445 2014-01-01 17:05:00    1.376195 2014-01-01 17:10:00    1.375385 2014-01-01 17:12:00    1.376115 dtype: float64  In [187]:hl.index Out[187]: <class 'pandas.tseries.index.DatetimeIndex'> [2014-01-01 17:00:00, ..., 2014-01-01 21:30:00] Length: 89, Freq: None, Timezone: None  In: pd.concat([df, hl], axis=1) Out: [stack trace] ValueError: Shape of passed values is (5, 286), indices imply (5, 276) 
like image 756
birone Avatar asked Dec 31 '14 09:12

birone


1 Answers

I had a similar problem (join worked, but concat failed).

Check for duplicate index values in df1 and s1, (e.g. df1.index.is_unique)

Removing duplicate index values (e.g., df.drop_duplicates(inplace=True)) or one of the methods here https://stackoverflow.com/a/34297689/7163376 should resolve it.

like image 191
lmart999 Avatar answered Oct 06 '22 01:10

lmart999