Pandas concat ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

Question

I am trying to analyze the Gizette Dataset from a Feature Selection Challenge

when i try to concat the train dataframe with the label series based on pandas example it

throws

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

Code:

import pandas as pd

trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data'
                              ,delim_whitespace=True
                              ,header=None
                              ,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
# print 'finished with train data'
trainLabel = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.labels'
                           ,squeeze=True
                           ,names=['label']
                           ,delim_whitespace=True
                           ,header=None)
trainData.info()

# outputs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6000 entries   
Columns: 500 entries, AA to TF   
dtypes: int64(500)None



trainLabel.describe()

#outputs
count    6000.000000
mean        0.000000
std         1.000083
min        -1.000000
25%        -1.000000
50%         0.000000
75%         1.000000
max         1.000000
dtype: float64

readyToTrain = pd.concat([trainData, trainLabel], axis=1)

full stack trace

   File "C:\env\Python27\lib\site-packages\pandas	ools\merge.py", line 717, in concat  
     verify_integrity=verify_integrity)  
   File "C:\env\Python27\lib\site-packages\pandas	ools\merge.py", line 848, in __init__  
     self.new_axes = self._get_new_axes()  
   File "C:\env\Python27\lib\site-packages\pandas	ools\merge.py", line 898, in _get_new_axes  
     new_axes[i] = self._get_comb_axis(i)  
   File "C:\env\Python27\lib\site-packages\pandas	ools\merge.py", line 924, in _get_comb_axis  
     return _get_combined_index(all_indexes, intersect=self.intersect)  
   File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3991, in _get_combined_index  
     union = _union_indexes(indexes)  
   File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 4017, in _union_indexes  
     result = result.union(other)  
   File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3753, in union  
     uniq_tuples = lib.fast_unique_multiple([self.values, other.values])  
   File "lib.pyx", line 366, in pandas.lib.fast_unique_multiple (pandas\lib.c:8378)  
     ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

edit: installed library from binary from lfd.uci.edu/~gohlke/pythonlibs pandas-0.14.1.win-amd64-py2.7

tried suggestion to convert series to frame (did not work same stacktrace as above) frame info:

dataframe info (trainData)

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 6000 entries, (550, 0, 495, 0, 0, 0, 0, 976, 0, 0, 0, 0, 983, 0, 995, 0, 983, 0, 0, 983, 0, 0, 0, 0, 0, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 808, 0, 778, 0, 983, 0, 0, 0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 991, 983, 983, 0, 0, 0, 0, 0, 0, 0, 983, 735, 0, 0, 983, 983, 0, 0, 0, 0, 569, 0, 0, 0, 0, 713, 0, 0, 0, 0, 0, 983, 983, 0, ...) to (0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 948, 995, 348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 751, 0, 0, 0, 0, 0, 0, 0, 0, 804, 0, 0, 0, 862, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 0, 0, 0, 0, 995, 0, 0, 0, 0, 0, 0, 840, 0, 0, 0, 976, 0, 0, 0, 0, 0, 0, 777, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)
    Columns: 500 entries, AA to TF
    dtypes: int64(500)None

series to dataframe info (trainLabel):

    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 6000 entries, 0 to 5999
    Data columns (total 1 columns):
    label    6000 non-null int64
    dtypes: int64(1)None

The Unfun Cat · Accepted Answer

Like joris pointed out (and like I had to figure out myself because I did not read the comments first) the problems are your indexes.

Change your code from

pd.concat(to_concat, axis=1)

to

pd.concat([s.reset_index(drop=True) for s in to_concat], axis=1)

Pandas concat ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

Tags:

pandas

python-2.7

lapolonio

1 Answers

The Unfun Cat

Recent Activity

Donate For Us

Pandas concat ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

Tags:

pandas

python-2.7

lapolonio

1 Answers

The Unfun Cat

Related questions

Recent Activity

Donate For Us