I am trying to analyze the Gizette Dataset from a Feature Selection Challenge
when i try to concat the train dataframe with the label series based on pandas example it
throws
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
Code:
import pandas as pd
trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data'
,delim_whitespace=True
,header=None
,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
# print 'finished with train data'
trainLabel = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.labels'
,squeeze=True
,names=['label']
,delim_whitespace=True
,header=None)
trainData.info()
# outputs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6000 entries
Columns: 500 entries, AA to TF
dtypes: int64(500)None
trainLabel.describe()
#outputs
count 6000.000000
mean 0.000000
std 1.000083
min -1.000000
25% -1.000000
50% 0.000000
75% 1.000000
max 1.000000
dtype: float64
readyToTrain = pd.concat([trainData, trainLabel], axis=1)
full stack trace
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 717, in concat
verify_integrity=verify_integrity)
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 848, in __init__
self.new_axes = self._get_new_axes()
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 898, in _get_new_axes
new_axes[i] = self._get_comb_axis(i)
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 924, in _get_comb_axis
return _get_combined_index(all_indexes, intersect=self.intersect)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3991, in _get_combined_index
union = _union_indexes(indexes)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 4017, in _union_indexes
result = result.union(other)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3753, in union
uniq_tuples = lib.fast_unique_multiple([self.values, other.values])
File "lib.pyx", line 366, in pandas.lib.fast_unique_multiple (pandas\lib.c:8378)
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
edit: installed library from binary from lfd.uci.edu/~gohlke/pythonlibs pandas-0.14.1.win-amd64-py2.7
tried suggestion to convert series to frame (did not work same stacktrace as above) frame info:
dataframe info (trainData)
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6000 entries, (550, 0, 495, 0, 0, 0, 0, 976, 0, 0, 0, 0, 983, 0, 995, 0, 983, 0, 0, 983, 0, 0, 0, 0, 0, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 808, 0, 778, 0, 983, 0, 0, 0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 991, 983, 983, 0, 0, 0, 0, 0, 0, 0, 983, 735, 0, 0, 983, 983, 0, 0, 0, 0, 569, 0, 0, 0, 0, 713, 0, 0, 0, 0, 0, 983, 983, 0, ...) to (0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 948, 995, 348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 751, 0, 0, 0, 0, 0, 0, 0, 0, 804, 0, 0, 0, 862, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 0, 0, 0, 0, 995, 0, 0, 0, 0, 0, 0, 840, 0, 0, 0, 976, 0, 0, 0, 0, 0, 0, 777, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)
Columns: 500 entries, AA to TF
dtypes: int64(500)None
series to dataframe info (trainLabel):
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6000 entries, 0 to 5999
Data columns (total 1 columns):
label 6000 non-null int64
dtypes: int64(1)None
Like joris pointed out (and like I had to figure out myself because I did not read the comments first) the problems are your indexes.
Change your code from
pd.concat(to_concat, axis=1)
to
pd.concat([s.reset_index(drop=True) for s in to_concat], axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With